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ABSTRACT 


In recent years there has been an increased interest in effective individual control 
and enhanced security measures, and face recognition schemes play an important role in 
this increasing market. In the past, most face recognition research studies have been con- 
ducted with visible imaging data. Only recently have IR imaging face recognition studies 
been reported for wide use applications, as uncooled IR imaging technology has im- 
proved to the point where the resolution of these much cheaper cameras closely ap- 
proaches that of cooled counterparts. This study is part of an on—going research con- 
ducted at the Naval Postgraduate School which investigates the feasibility of applying a 
low cost uncooled IR camera for face recognition applications. This specific study inves- 
tigates whether nonlinear kernel—based classifiers may improve overall classification 
rates over those obtained with linear classification schemes. The study is applied to a 50 
subject IR database developed in house with a low resolution uncooled IR camera. Re- 
sults show best overall mean classification performances around 98.55%, which repre- 
sents a 5% performance improvement over the best linear classifier results obtained pre- 
viously on the same database. This study also considers several metrics to evaluate the 
impacts variations in various user—specified parameters have on the resulting classifica- 
tion performances. These results show that a low-cost, low-resolution IR camera com- 


bined with an efficient classifier can play an effective role in security related applications. 
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EXECUTIVE SUMMARY 


Numerous face recognition studies have been reported in the literature over the 
years due to their increasing usage in security, access control, biometric and other types 
of applications. In the past most research studies have been conducted in visible imaging 
data and only recently have IR imaging face recognition studies been reported. IR imag- 
ing offers the main advantage over visible imaging of being invariant to illumination 
changes. Good resolution IR imaging has been restricted to very expensive cooled de- 
vices which slowed the interest in IR face recognition down until the recent past. How- 
ever, technological advances have increased the resolution of uncooled IR cameras to the 
point that their resolution closely approaches that of cooled devices at a fraction of the 
cost. This study extends research conducted earlier by Pereira [Pereira, 2002] and Lee 
[Lee, 2004], who collected an infrared (IR) face database using a low resolution IR cam- 
era and considered linear classifier approaches. This study explores the performance of a 
kernel—based nonlinear pattern classification algorithm applied to the same 50 subject 
database used by Lee. 

The nonlinear kernel—based Generalized Discriminant Analysis (GDA) consid- 
ered here is basically an extension of the Linear Discriminant Analysis (LDA), where the 
data is implicitly projected into a new higher dimensional nonlinear transformed domain 
prior to the application of the LDA step. As a result, the GDA approach takes advantages 
of the nonlinear characteristics of data, which LDA does not have access to for higher 
classification rates. 

First, this study investigates several distance measures and selects the “best” one 
to apply to the data under investigation. It also considers various user-specified parame- 
ters such as the specific type of kernel function, the number of eigenvectors selected in 
the GDA step and investigates their resulting impacts on overall mean classification per- 
formances. As part of the study, the concept of distance maps is defined to visually 
evaluate and compare classifier performances, and apply concepts of confidence inter- 
vals, confidence scores, and rank—scores to assist the user in designing the “best” classi- 


fier. 


XV 


The study implements a cross—validation variant to insure results derived are sta- 
tistically meaningful, and software implementation steps are vectorized to minimize the 
associated computational load. Results show that the GDA implementation, when select- 
ing the Mahalanobis Angular distance, a polynomial kernel of degree 2 and the 500 top 
eigenvectors for the GDA projection step, provides the best mean classification results 
(98.55%). These results correspond to a 5% improvement over the best derived with the 
linear Fisherface implementation, and show that significant improvement may be gained 
by implementing nonlinear kernel—based schemes. Results also show that a low-cost, 
low-resolution IR camera combined with an efficient classifier can play an effective role 


in security related applications. 


XVi 


I. INTRODUCTION 


Numerous face recognition studies have been reported in the literature [Adini, 
1997; Hearst, 1998; Liu, April 2004] over the years due to their increasing usage in secu- 
rity, access control, biometric and other types of applications. In the past, most research 
studies have been conducted in visible imaging data and only recently have infrared (IR) 
imaging face recognition studies been reported [Pereira, 2002; Chen, 2003; Socolinsky, 
2003; Lee, 2004]. IR imaging offers the main advantage over visible imaging of being 
invariant to illumination changes [Chen, 2003]. Until the recent past, good resolution IR 
imaging was restricted to very expensive cooled cameras which lowered the interest in IR 
face recognition. However, technological advances have increased the resolution of un- 
cooled IR cameras to the point that their resolution closely approaches that of cooled 


cameras at a fraction of the cost. 


This study is an extension to the research conducted earlier by Pereira [Pereira, 
2002] and Lee [Lee, 2004]. First, Pereira initialized the study. He developed an uncooled 
IR image capture system, generated a small database of 14 adult subjects, collected in a 
controlled indoor environment, and implemented two classic linear classification algo- 
rithms using Matlab, the Principal Component Analysis (PCA), and the Fisherface 
[Pereira, 2002]. Later Lee extended the database to 50 adult subjects and applied the 
same two linear classification algorithms [Lee, 2004]. This study extends the previous 
research to a kernel—based nonlinear classification approach, the Generalized Discrimi- 
nant Algorithm, (GDA) and compares classification performances with those obtained 


using linear implementations. 


The following sections briefly reviews the concepts of IR radiation, followed by a 
presentation of the procedures used for image collection. 
A. THEORETICAL BACKGROUND 

The IR radiation is the section of the electromagnetic spectrum whose wave- 
lengths lie primarily between 1-100 um. This wavelength range is above the visible 
spectrum range and below the microwave range [Dereniak, 1996]. Indicatively, Fig. 1.1 


illustrates the spectrum, marking the ultraviolet, visible and IR regions. 


1 


Spectrum: Ultraviolet to Infrared 





0.1 meron 100 micron 


Figure 1.1. | Visible Spectrum versus the Ultraviolet and the IR Ranges (From [Sierra 
Pacific Corp., 2004].). 


Though the IR radiation is not visible, it can be perceived in the form of thermal 
energy, as all objects which have temperatures greater than 0 K emit thermal radiation. 
The amount of thermal energy an object emits depends on both the fourth power of its 
absolute temperature and its emissivity. Emissivity of an object indicates to which degree 
it emits thermal radiation, and it takes a value between 0 and 1. The larger the emissivity 
of an object is, the larger the thermal radiation it emits, and is usually a characteristic de- 
termined by the material structure of the object’s surface (i.e., its molecular characteris- 
tics). A blackbody is defined to be the object that has emissivity value equal to 1. This 


ideal case of an object does not exist in nature. 


The human body has an emissivity of about 0.98, at 310 K [Jones, 1998]. Addi- 
tionally, each individual has a vein and tissue structure which is peculiar to him/her, re- 
sulting in the unique emission of a thermal radiation pattern. The temperature variation 
across the face of a typical person is about 7° C. Since the spatial variation of one’s facial 
temperature profile is not very abrupt, the low spatial resolution of the IR cameras gener- 
ally should not affect the quality of the images. For this reason, the IR radiation emitted 
by the human face is ideal for high classification performance scores in face recognition 


applications. 


An additional advantage of using IR images as compared to visible images for 
face recognition applications is that IR images are not affected by variations in the light- 
ing of a particular environment while visible images tend to lose much of their informa- 
tion in poorly lit conditions [Chen, 2003]. However, apart from the low resolution, the 
main drawback of using IR images is the relatively high cost of IR cameras (purchase and 


maintenance). This additional cost is due to the fact that cooled IR cameras require cryo- 
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genic cooling in order to achieve high temperature resolution. In order to make such im- 
agery more cost-effective, recent advances in uncooled infrared camera technology pro- 


vide a less expensive means of acquiring infrared images. 


Apart from face recognition applications, IR imaging has also been successfully 
used in many civilian and military applications such as Forward Looking Infrared (FLIR) 
systems for fighter aircrafts (F—14, F—16, F—18, etc.), helicopters, and IR targeting sys- 
tems—missiles which use IR radiation to locate the targets [Sierra Pacific Corp., 2004]. 

B. EQUIPMENT USED 

The IR face images were collected at the Naval Postgraduate School (NPS), using 
an uncooled IR camera (IR 160), manufactured by Infrared Solutions. 

Even though the spatial resolution (160x120 pixels) and temperature resolution 


(60 mK) provided by the camera are relatively low, the wavelength region in which it op- 


erates (8—14 1m) is critical for capturing wavelengths emitted by human body at about 
310 K (which corresponds to peak wavelength of 10 um). This phenomenon is illustrated 


in Fig. 1.2. 
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Figure 1.2. Blackbody spectrum for temperatures near that of the human body (From 
[Pereira, 2002].). 


The camera used here is said to be uncooled because there is no requirement to 
cool the camera’s sensor, which is a requirement for ordinary cooled cameras. The de- 


tailed specifications of the camera can be found in [Infrared Solutions Inc., 2004]. 


The IR—160 is connected to a monitor to allow for previewing the IR images be- 
fore their capture. The camera’s output produces IR images in a digital format. The digi- 
tal image is then transmitted to a PC via an 8-bit, RS—232 serial port. Next, the collected 
images are cropped [Lee, 2004], reshaped in a column—vector format, and inserted in the 


database matrix (this process will be discussed further in Chapter IV). 


The equipment used in the IR database collection is illustrated in Fig. 1.3. The de- 
tailed, geometric layout of the exact location of the individual posing for capturing his 
picture, his/her gazing directions towards marked spots on the facing wall, and the loca- 
tion of the pieces of equipment used (IR camera, monitor, PC) are shown in Figs. 1.4 and 


1.5. 


The nine spots on the wall, (forming a square with the center marked on it), corre- 
spond to the gazing positions of the individual, having a specific facial expression (neu- 
tral, smiling, or pronouncing the vowel “u”), and for which images were captured. In ad- 
dition, a tenth image was collected with random gazing position, located within the de- 
fined square. Each group of ten images constitutes one out of three possible sections (fa- 
cial expressions) collected for a given subject (class). Additional details regarding the 


organization of the IR image collection can be found in [Lee, 2004]. 





Figure 1.3. Equipment components used for IR data collection (From [Lee, 2004].). 
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Figure 1.4. Lateral aspect of IR data collection system layout (From [Pereira, 2002].). 
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Figure 1.5. Forward aspect of IR data collection system layout (From [Pereira, 
2002].). 


C. THESIS OVERVIEW 

This thesis consists of four chapters. Chapter I presents the basic idea behind the 
work conducted. Chapter II reviews the concepts of linear classifiers. Extensions to 
nonlinear classification schemes are described in Chapter III. Next, Chapter IV presents 
the experiments conducted on the face database and discusses results obtained. Finally, 
conclusions and recommendations for further research are presented in Chapter V. 
Mathematical details can be found in Appendix A. The software implemented during the 
study is included in Appendix B. Last, Appendix C includes typical performance meas- 
urement plots. 
D. SUMMARY 

This chapter presented the principles of IR imaging for face recognition applica- 
tions. It also discussed some of the advantages of using IR over visible imaging and it 
presented some of the technical characteristics of the uncooled, IR-160 camera used in 
this study, together with some characteristics of the equipment used for the IR data col- 
lection. The following chapter presents the linear classification algorithms considered for 
benchmarking purposes against the nonlinear kernel based classifier approach, which is 


the main focus of this thesis. 


I. LINEAR CLASSIFICATION ALGORITHMS 


This chapter first presents a brief introduction to two classical linear classification 
schemes widely used in the pattern recognition community, the Principal Component 
Analysis (PCA) and the Linear Discriminant Analysis (LDA). Next, it presents a brief 
comparison between the two schemes. 

A. INTRODUCTION 

The PCA and the LDA schemes have been used extensively and successfully in 
applications for image and speech recognition, machine learning, financial planning, and 
various other scientific areas [Martinez, 2001]. The database considered in the present 
face recognition study includes 50 adult subjects (1.e., classes), each having thirty infrared 
images. Identical samples and images were used by Lee in a study conducted in 2004 
[Lee, 2004]. Two main types of class assignments are typically available in pattern rec- 
ognition applications, namely the “closed set” and the “open set” implementations. 
Whereas the closed set implementation assumes that all testing trials belong to the data- 
base used to design the classification algorithm, no such assumption is made in the open 
set implementation. As a result, testing data trials may not necessarily be assigned to a 
given class in open set implementations. The present study was restricted to the closed set 


implementation type, 1.e., all individuals tested were considered to be in the training da- 


tabase. 

B. PRINCIPAL COMPONENT ANALYSIS (PCA) — EIGENFACE IMPLE- 
MENTATION 
1. Introduction 


Karl Pearson first introduced the Principal Component Analysis in 1901 [Lee, 
2004]. The PCA method was created initially for data compression, but has also been ex- 
tensively used with success in pattern recognition applications. The basic idea behind 
PCA is to represent features extracted from the data in terms of the linearly independent 
eigenvectors obtained from the data’s covariance matrix. Dimension reduction is ob- 
tained by approximating the feature vectors using a subset selected from the top eigen- 
vectors (i.e., those associated with the eigenvalues with larger magnitude), thereby pre- 


serving most of the feature variance information. The PCA approach has been used ex- 


tensively in classification applications even though it is, theoretically, not well-suited to 
deal with such problems because the ability to discriminate between individuals may of- 
ten be contained in the small details (i.e., associated with eigenvectors with smaller mag- 
nitudes). 

2 Algorithm Description 

The PCA algorithm (also known as “Eigenspace Projection’, or “Karhunen Lo- 
eve” decomposition) seeks a linear projection direction which best represents the data in 
a norm sense [Fargues, 2001]. Research by [Yambor, 2000] presents a good illustration 
of this process and is discussed next. Recall that all available P images are stored in ma- 


trices of dimension (mx 7n). First, these images are reshaped into a set of column vectors 


eS : of length VN = mn. Next, the reshaped vectors are stacked column—wise, to form 


iets, 


the (NV x P) dimensional training image data matrix X = Es ak |: As a result, the over- 


all mean image of length N is given by: 


1 
m=— >) x, (2.1) 


which leads to the corresponding i” reshaped image vector, defined by the equation: 


—_ i ee ea aes Vi 
NS m= | ea, Mees Xp | (2.2) 


As aresult, the centered (N x P) dimensional training data matrix X is given by 


X =[x' 1x? 1x71... 187], (2.3) 


where the vertical bars denote that the column vectors {x! es pare appended in se- 


quence, in order to form the matrix .Y.The N-dimensional data covariance matrix is 
given by: 
Q= XX". (2.4) 


Note that the covariance matrix, Q, has up tomin(P, V) non zero eigenvalues. Next, also 


recall that the eigenvector associated with the largest eigenvalue is that associated to the 
direction with the highest energy, and so on. Therefore, the PCA projection matrix V is 


made up from the top K eigenvectors v,, where K is user—specified, as: 


V =[v, lv, |....1¥,]. (2.5) 


As a result, the projected centered data Y can be expressed as: 
W=VTX, (2.6) 


Next, the projected data is used to compute the projected “class—centroids” as the 
means of the projected data associated with each class resulting in a class—centroid matrix 


of dimension (K xC), where C is the number of classes. Lastly, testing images are pro- 


jected onto the smaller dimensional space using the projection matrix, V, defined with the 
training data. The classification decision is obtained by selecting as the class that one 
which is closest in norm to the projected testing image. Various types of norms may be 


applied, and in most cases, the simple Euclidean distance (Norm-—2) is selected. 


According to [Belhumeur, 1997], the PCA method yields projection directions 
that maximize the total scatter across all images. In order to find these projections, the 
training data covariance matrix has to be calculated and its eigenvalue—eigenvector de- 
composition performed. Eigenvectors associated with the top eigenvalues in magnitude 
constitute the projections’ directions mentioned above. However, PCA still retains un- 
wanted information, e.g., that due to facial expressions and lighting. According to a study 
reported in [Adini, 1997], intra—class variations that were generated by altering the direc- 
tion of one’s gaze and lighting conditions may be larger than inter—class variations. As 
discussed in [Pereira, 2004], classification errors may be reduced by discarding the first 
few top eigenvectors when dealing with a small IR database. However, Lee showed that 
such a trend does not extend as the database size [Lee, 2004]. Nevertheless, it remains 
true that PCA is not optimally designed to discriminate between classes. However, PCA 
can help to reduce problems associated with dimensionality due to its dimension reduc- 


tion capability. This process will be more fully described in the next section. 


Implementation of the PCA method requires defining the “Total—Scatter” matrix, 


S, (.e., the data covariance matrix) via the following equation: 


Sp = yee: — U(X, ah (2.7) 


where x, is an image reshaped as a column vector and w is the mean of all training im- 


ages. 


The projection vectors matrix, W, 


opt 1S Chosen in such a way that it maximizes the 


determinant of the total scatter matrix of the projected training data matrix according to: 


W,,, =arg(max, |W"S,W |) =[w,w,,...w,, |, (2.8) 


opt m 


where w,,W,,...,W,, Constitute the set of N—dimensional projection eigenvectors (sorted 


by descending eigenvalue magnitude that are obtained from S, ). 


The PCA is implemented algorithmically using the following steps: 


Extract feature vectors from data images, which are obtained from the sub- 
jects and reshaped into a column—vector format. 


Organize the training data into a matrix form (training—data—matrix) by 
appending the column vectors, which represent the various training im- 
ages, such that images of the same class are appended together. Conse- 
quently, all operations take place in a column—wise fashion, in the various 
steps of the algorithm. 


Calculate the covariance matrix of the training data matrix. 
Calculate the eigendecomposition of the covariance matrix. 


Sort the eigenvalues in decreasing order of their magnitude, along with 
their associated eigenvectors. Keep all eigenvectors that are associated 
with a particular eigenvalue magnitude above a user-defined threshold. 


Project the training data matrix onto the eigenvector subspace. 


Calculate the class centroids from the projected training data. The class 
centroids are used to characterize each specific class. 


Project the testing data onto the eigenvectors subspace. 


Calculate distances between each projected testing image to all class cen- 
troids. 


Assign each testing image to a specific class by selecting the class, which 
corresponds to the smallest distance between the projected testing and all 
previously—computed training class centroids. 


10 


C. FISHERFACE APPROACH 

1. Introduction 

The Linear Discriminant Analysis (LDA) algorithm, also known as Fisher Linear 
Discriminant (FLD) analysis, is based on a linear projection of the data onto a lower di- 
mensional feature space, which identifies the projection that best discriminates among 
classes, rather than those directions that best represent the data [Martinez, 2001]. As a 
result, the LDA approach is less affected than PCA by variations in lighting and face ex- 
pression [Belhumeur, 1997]. 

2: Algorithm Description 

The LDA approach uses the concept of intra—class (or ““Within—Class”’) and inter— 
class (or “Between—Class”’) scatter matrices [Yambor, 2000]. The Within—Class—Scatter 


matrix (WCS) S, for class i is defined as: 


S,= >) @-m)(x-m,)’, (2.9) 


xeX; 


where m, is the mean image of class 7. The overall WCS matrix, S,,, is defined as fol- 


Ww? 


lows: 
Cc 
ae (2.10) 


where C is the number of classes. 


In a similar fashion, the Between—Class—Scatter matrix (BCS) S, is defined as: 
C 
oe =>°n,(m,-—m)(m,-m)', (2.11) 
i=l 


where m is the mean of all the training images and n, is the number of images in the 7“ 
class. 

The LDA projection direction matrix, W,,, is defined as the matrix which maxi- 
mizes the ratio of the determinant of the Between—Class—Scatter S, to the determinant of 
the Within—Class—Scatter S,,, both defined on the training data [Belhumeur, 1997]. This 


is denoted by the equation, known as the “Rayleigh quotient”: 
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Wo. = gma (2.12) 


opt 


IWS ,W | 
"IWS, WI) 

Appendix A.1 shows that finding the solution for a particular Rayleigh quotient is 
equivalent to solving the following generalized eigenvalue—eigenvector problem, (pro- 
vided that the WCS matrix S,, is non-singular): 


S,Ww, =AS_w,,i =1,2,...k, (2.13) 


w 


where A, is the 7 generalized eigenvalue, w, is thei” generalized eigenvector (obtained by 
solving the above generalized eigenproblem), and W,,, =[w,,w,,....w,] 1s the eigenvector 


matrix solution. Recall that there is a maximum potential occurrence of C —Inonzero 
generalized eigenvalues, as reviewed in Appendix A.2 [Belhumeur, 1997]). These C —1 
nonzero eigenvectors constitute the Fisher basis vectors, and are typically sorted by order 


of decreasing eigenvalues. 


Note that the N—dimensional WCS matrix S,, has a maximum rank equal to 
P—C, where P is the number of images in the training set, and C is the number of 
classes, (reviewed in Appendix A.3). In addition, the N—dimensional BCS matrix S, has 
a maximum rank equal to C —1, as this matrix is the combination of C matrices of rank 1. 
Note also that the number of images, P, is usually smaller than the number of pixels per 
image N, in face recognition applications which result in singular scatter matrices. When 
all of the above is taken into account, the Rayleigh quotient can no longer be solved di- 
rectly [Belheumer, 1997]. A possible solution to the matrix S,, singularity problem is to 
reduce the data dimension, in order to insure that only non-singular scatter matrices are 
produced prior to applying the LDA scheme, [Belhumeur, 1997]. The resulting scheme is 
called the “Fisherface approach.” 


Thus, in the Fisherface approach, original images are initially projected using the 
PCA approach in order to reduce the dimension of the projected data to P—C . Then, the 
LDA approach is applied to further reduce the data, to dimension C—1. Next, class cen- 
troids are computed using the training data, as it was done for the PCA approach. Finally, 


testing images are projected onto the smaller dimensional space using the projection ma- 
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trices defined in PCA and LDA steps. Final class decision is obtained by selecting for 
each testing image the class, which is closest in norm to the training data class centroids. 
D. PCA AND LDA COMPARISONS 

Typical implementations of PCA and LDA approaches can be compared by ap- 
plying them to two 2—dimensional data classes (each one with five samples), as shown in 
Fig. 2.1, where the two classes are represented by stars and circles, respectively. Figure 
2.1 also shows their 1—dimensional projections, as defined by the PCA (solid line) and 
the LDA (dashed line) approaches. This figure illustrates that the PCA approach causes 
projected classes to be interlaced, while the LDA approach best preserves discrimination 
between the projected classes. The respective Matlab code that performs the subject PCA 


LDA projection comparison is provided in Appendix B. 


PCA vs LDA Projections 





Figure 2.1. | PCA and LDA projection directions for the two 2 classes, “‘o” and “*”’. 


13 


E. CONCLUSIONS 

This chapter discussed two basic linear classifiers, PCA and LDA. The chapter 
also illustrated the notion that LDA is better suited to classification problems than is 
PCA. Nevertheless, it should be recognized that either algorithm is successful for data for 
which linear projections are well matched to the data structure. Extension of the ap- 


proaches to nonlinear classifiers is considered in the next chapter. 
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HI. NONLINEAR CLASSIFICATION ALGORITHMS 


This chapter extends the linear classifiers presented earlier to nonlinear schemes. 
First, we present the basic idea behind nonlinear classification approaches and introduce 
the concept of kernels, which is the key idea behind nonlinear schemes. Next, we de- 
scribe the “kernel trick” which makes such procedures computationally feasible. Finally, 
we present the specific application of kernel—based implementations to the linear dis- 
criminant analysis called the Generalized Discriminant Analysis (GDA). 
A. THEORETICAL BACKGROUND 

Even though LDA is designed for discrimination applications, it is based on a lin- 
ear projection approach. This may result in performance degradations when the data be- 
havior is not well-suited to such design constraints, especially when classes are not line- 
arly separable. A significant amount of research has been conducted recently in the area 
of kernel—based schemes, much of which has led to the implementation of nonlinear pro- 
jections. The kernel theory was first used in pattern recognition applications by Aizer- 
man, [Aizerman, 1964; Baudat, 2003]. The main kernel—based learning methods are the 
“Support Vector Machines” (SVM), the “Kernel Principal Component Analysis” 
(KPCA), and the “Kernel Fischer Discriminant Analysis” (KFD), also known as “Gener- 
alized Discriminant Analysis” (GDA), [Muller, 2001]. All of these methods have been 
widely applied in classification, regression, pattern and object recognition, optical charac- 
ter recognition (OCR), text categorization, time series prediction, gene expression profile 
analysis, protein and DNA analysis and other applications [Muller, 2001; Roobaert, 1999; 
Joachims, 1998; Brown, 2000]. 


Specifically, the following results have been recently reported when applying 
kernel—based schemes in: 


e Face recognition: Various nonlinear discrimination algorithms have been 
developed [Baudat, 2000, 2003; Socolinsky, 2003; Liu, 2004] by taking 
into account nonlinear features of the training data, thereby extending 
classification to data which cannot be solved using linear algorithms. For 
example, the GDA method has been applied successfully with various 
kernel functions [Liu, April 2004; Liu, May 2004; Hearst, 1998]. 
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e 3—Dimensional object recognition: the SVM algorithm has been succes- 
sively applied to recognize 3—dimensional objects when a limited number 
of training views are available [Roobaert, 1999]. 


e Text categorization: The objective of this application is to assign docu- 
ments into a set of defined document types, such as “news”, “historical”, 
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“business’’, “scientific”, “medical”, etc. [Joachims, 1998]. SVM ap- 
proaches have been applied successfully to organize network data, docu- 
ment databases, or even to learn a user’s web browsing preferences 
[Joachims, 1998]. 


® Optical Character Recognition (OCR): The goal of this application is to 
identify handwritten samples. The first real world experiments were ap- 
plied to OCR data using a basic SVM implementation and were shown to 
perform well (with error rates of only around 0.7%) [Muller, 2001]. 


e Gene expression profile and DNA and protein analysis: SVM approaches 
have also been applied to classify genetic functionality. Such applications 
require the extraction of data from the DNA strand relating to how a gene 
expresses itself [Brown, 2000; Muller, 2001]. 


1. Introduction to Kernel Theory 

The basic idea behind Kernel methods is to map the data, x €_X, into a large di- 
mensional feature space, F, via a function, g, [Baudat, 2003], where the mapped data 
have characteristics which can be manipulated with simple computational methods. This 
mapping may seem counter—intuitive due to the issues raised by the “curse of dimension- 
ality” [Muller, 2001]. Mapping essentially shows that the number of samples required to 
set—up an estimation problem increases exponentially, with the dimension of the sample 
space. Nevertheless, statistical learning theory also states that learning in the high dimen- 
sional feature space, F, can be simple if low-complexity criteria and specific mapping 
functions are selected. Therefore, the key issue when dealing with kernel—based schemes 
lies in the selection of the specific mapping @g. Rather than being carried out explicitly, 
this calculation is reformulated in terms of dot products applied to the data. Thus, data in 
the transformed space are expressed in terms of a kernel matrix containing comparisons 
of data pairs only, which makes the implementation of this process less expensive. 

2. Introduction to the “Kernel Trick” 

Assume that there are m training data [Scholkopf, 2000]: 

X= {Hy Ny 5e<05%,,} (3.1) 
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Also, assume that the training patterns, x,, belong to one of the two classes with 
labels, y,, defined as: 


y, €{-141}, Vi=1,2,....m. (3.2) 


Therefore, the training data with label assignment can be expressed as: 


(X12 Vp )s (%q5 Vy )o-e+3 (Xo Vn) EX X{-1, +]. (3.3) 


Recall that the goal of learning applications is to assign a label (i.e., class) to an unlabeled 
data pattern. Therefore, its goal can be viewed as predicting the value of labels, y, for new 
unlabeled patterns, x € X, in such a way that the ordered pair, (x, y), is as similar as pos- 
sible to ordered pairs defined in the training data. Such a procedure requires the definition 
of a similarity measure in the transformed space. As a result, a similarity measure be- 
tween two data points, x, and x,, is implemented as: 

k:XxX OR, 


3.4 
Cy ee ee ee 


where @ represents the mapping function and can be rewritten in terms of a kernel func- 
tion, k, for specific choices of mapping, then implemented in terms of a dot product only. 


This process is known as the “kernel trick” [Scholkopf, 2000]. 


In order to be able to use dot products as similarity measurements, input patterns 
(i.e., vectors) should lie in a dot product domain. This means that the arguments of the 
kernel functions should be dot products of the original input element vectors. Thus, the 
new mapped space, F’, includes the kernel function values, which have dot products as 


inputs. Generally, the space F is different from the original N—dimensional space. 


Next, we present a basic example to illustrate the “kernel trick” [Muller, 2001]. 


66,99 


Assume that there are two 2—dimensional classes, “‘x’”’ and “‘o’”, as shown on the 


left-hand plot of Fig. 3.1. 


Figure 3.1 shows that the original two classes are nonlinearly separable. Conse- 
quently, a nonlinear mapping @ is required to potentially transform the input data into a 
higher dimensional space in order to enable the transformed data classes to be linearly 


separable. In this specific case, the mapping @g will be: 
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(3.5) 





Figure 3.1. _ [Illustration of a kernel—based, nonlinear transformation for a two-class 
problem. Left plot: original, 2—dimensional input space; Right plot: transformed 3- 
dimensional feature space. (From [Muller, 2001].). 


A potentially nonlinear mapping that transforms the 2—dimensional element vectors into 
3—dimensional element vectors can be expressed as: 


(X1,%_) > (2%, 25,23), (3.6) 


where the transformed space features are defined as the second order monomial terms: 














(2i3 25924) (x2, V2x,x55%2). (3.7) 
Note that the 2 present in the above term, X,X,, 18 added for normalization purposes so 
that the nonlinear mapping can lead to a valid kernel expression, as shown below. 


Next, let us define the following polynomial kernel function as: 


k(x, y) 0 (x+y), (3.8) 














where x and y are 2—dimensional element vectors (i.e., x = (x,,x,), and y = (y,, y,) ). 


The second order kernel function is defined for d=2 as: 
k(x, y)=(x- vy (3.9) 


For 2—dimensional vectors x and y, the kernel expression k(x, y) may be rewritten as: 
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k(x, y) =(x- yy 
= (4,2) 92)? 
= (x,y, +x,),)° 
= (xX), +2x, VX). + X55) 
= (x?, V2x,x,,92)(97?,V2yy. 93)" 
=(9(x):- 9(y)). 


(3.10) 


The above equations show that the nonlinear mapping ¢g defined in Eq. (3.7) is 
performed implicitly via the kernel expression, k(x, y), by the use of dot products in the 
input domain. 


It can be shown that such nonlinear mapping potentially offers the following ad- 


vantages [Scholkopf, 2000]: 


e Similarity measures (dot products) in the transformed space, F, can be de- 
fined as: 
K(x, X,) = (G(x) P,)). (3.11) 
e Linear algebra and analytic geometry concepts may be applied in the 


transformed space, facilitating data manipulation. 


e Flexibility in selecting the mapping function, g, allowing for the develop- 
ment of specialized algorithms. 


3. Support Vector Machines (SVM) 
SVM is a type of pattern classification and regression method that uses the con- 


cept of kernel functions, in conjunction with dot products [Scholkopf, 2000]. 


The basic idea behind SVM is to identify a hyperplane in a transformed high di- 
mensional space that linearly separates the mapped classes in that space. This identifica- 
tion procedure is accomplished via special kernel functions that are applied to the input 


element vectors. 


In order to clarify the concept at the basis of the SVM approach, consider the 
simple case of the classification of two nonlinearly separable classes. For this particular 


case SVM is used to select the optimum mapping q of the input space into a space in 


which a class separating hyperplane can be defined. This type of hyperplane is illustrated 
in Fig. 3.2 and lies approximately in the middle of the shortest distance of the two 


classes’ convex hulls. The support vectors define the exact location of the hyperplane. In 
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Fig. 3.2, one can observe the support vectors in this newly transformed space, F’; these 
are a product of the elements of the two classes, the separating hyperplane, and the clos- 


est element vectors of each class to the hyperplane. 


The key point behind the SVM approach is that the kernel function of the element 
vectors’ dot products may be used directly, 1.e., without the need to carry out the mapping 


of the inputs to the new high dimensional space explicitly. 


Support Vector Machjne : Basic Idea @ Class 1 
A Class 2 





Hyperplane 


Figure 3.2. Basic idea of Support Vector Machines: Identification of the two-class 
optimum separating hyperplane in the new, high—dimensional transformed space. (After 
[Scholkopf, 2000].). 


The hyperplane in the high-dimensional transformed space is actually a nonlinear 
decision boundary in the input space. The aforementioned decision boundary, when 
viewed in the original input space, is shown as the nonlinear curve in Fig. 3.3, with the 
support vectors (marked as double circles) on the two curved lines above and below the 
nonlinear decision boundary. In the subject figure, one class consists of circle elements 


and the other of discus—shaped elements. 
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o: Class 1 
e: Class 2 
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Figure 3.3. | Original 2—dimensional two-class problem (classes, “‘o” and “e”’), show- 
ing the support vectors and the nonlinear optimal boundary. (From [Scholkopf, 2000].). 





The main advantages provided by the SVM method are [Scholkopf, 2000]: 
e The SVM method can identify and eliminate data outliers. 


e SVM schemes offer flexibility via the specific selection of the kernel func- 
tion. They also allow for flexibility to choose the most suitable similarity 
function, and can handle large feature spaces. 


The main drawbacks of the SVM method are [Scholkopf, 2000]: 


e The size of the quadratic programming problem that results from the SVM 
algorithm can be quite large [Scholkopf, 2000], resulting in high computa- 
tional load and the potential of numeric problems. 


e The potential of high noise levels in the classes’ element vectors increases 
the SVM implementation complexity as additional conditions have to be 
met to minimize noise impacts on the classification accuracy, resulting in 
an increased computational load. 


We note that optimization algorithms addressing the above issues have been re- 


cently proposed in the literature [Scholkopf, 2000]. 
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4. Kernel PCA 

The Kernel PCA executes a nonlinear mapping of the input space to a new high 
dimensional space, using the “kernel trick”. Next, the classic linear PCA is applied in the 
transformed space, resulting in a linear eigenvalue—eigenvector decomposition problem 
of a matrix, whose data elements are the values of the kernel function [Hearst, 1998]. 
Numerous results have been reported with this scheme which shows superior perform- 
ance over the basic PCA implementation, [Scholkopf, 1998; Kwang, 2002; Lu, 2003]. 
B. GENERALIZED DISCRIMINANT ANALYSIS (GDA) 

1. Introduction 

Kernel concepts may be applied to classical algorithms, such as the PCA or LDA. 
Applying kernel theory to LDA means that the linear LDA algorithm is applied to a 
nonlinear transformed data, resulting in a nonlinear algorithm without the drawbacks pre- 
sent in a direct nonlinear transformation of the algorithm. Therefore, the GDA solves an 


ordinary eigenvalue—eigenvector decomposition, albeit of the transformed data instead. 


Recall that the GDA is a supervised scheme. Therefore, issues dealing with algo- 
rithm generalization are still an issue, which in this case depends on the geometric char- 
acteristics of the training data, and not on its dimensionality. As a result, a judicious se- 
lection of the mapping function g is a significant factor in insuring the reliability of the 
results and a lower computational load. 

2. Algorithm Description 

This section describes the GDA derivation and closely follows the presentation 
given by [Baudat, 2000]. Assume that x is a column vector representing a reshaped image 
of the data matrix _X containing the M images included in the training dataset. Further, 


assume that X, represents the matrix containing the n, images of class i, where 
i=1,...,N, and Nis the total number of classes under study. Then, the total number of 


images contained in the training dataset is given by: 


N 
M=)in, (3.12) 
i=l 


and the data covariance matrix C is defined as: 
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Casa (3.13) 


assuming that the training data is centered (i.e., has mean equal to 0). 


Next, the input space XY must be mapped into a Hilbert space, F, by using a 


nonlinear mapping function g: 


og: X OF. 
(3.14) 
x > G9(x). 
Then, the covariance matrix in the resultant feature space, F, is given by: 
V =— YP," (x;)s (3.15) 
M y= 


assuming that the data is centered in the transformed space. 


Next, the covariance matrix B that is derived from the class centers expressed in 


the transformed space, F’, is defined as: 
B= > nga 3.16) 
us iP; » : 
where @, represents the mean value of class i, and is defined as: 


6 =+Y ol%,), (3.17) 


Nn; k=1 
where x, is element k of the class i. 


The data covariance matrix may be expressed in the transformed space, F, as fol- 


lows: 


N 


(eee! ; 
V=—P > 9% )P" (Xp) (3.18) 
M i=l k=1 


Note that the data covariance matrix V defined above in Eq. (3.18) is expressed in 
terms of dot products only. The main advantage behind kernel—based schemes is that 


nonlinear algorithms may be implemented in terms of expressions involving only the dot 
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product on the data considered, 1.e., without having to explicitly define and use the actual 


nonlinear transform expression g(x). The dot product expression obtained for data x, and 


x, 1s defined by the following kernel function: 
k, = k(x,,x,)=@" (x,)9(x,). (3.19) 


When dealing with different data classes, the kernel expression becomes: 


ki) ng =P (X,) PX,» (3.20) 
where p and q are data classes. 


Next, define the (M xM ) dimensional matrix K as: 


Ke= (Ke) yi (3.21) 
gz=l..N 
where K pg ate (n xn i dimensional sub-matrices defined as: 
= (K;,), (3.22) 


with i=1,...,n,, golLsghs p=l,...,N, and q=1.,...,N. 


Next, define the M—dimensional block diagonal matrix W, where each block W, is 


of dimension (n,xn,), and given by: 


eee (3.23) 
BNR (ahs a 


where n, represents the number of the data samples of class i. 


Recall that LDA maps the input space into one where the principal components 
maximize the ratio of the between—class—scatter matrix (also known as the inter—class 
inertia) and the within—class—scatter matrix (also known as the intra—class inertia). The 
use of kernel functions expands the LDA application to include the nonlinear transformed 
space. The principal components in the resultant space are nonlinearly related to the input 
variables and the kernel function, k, contributes to the creation of the function that 


nonlinearly separates the classes. Following the findings used in the LDA derivation, the 
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7° 
vy By 


GDA maximizes the Rayleigh quotient of the between-class scatter matrix over 





v Vv 

the within—class scatter matrix, where these V and B matrices (defined previously in Eqs. 
(3.15) and (3.16)) are defined in the transformed space. Therefore, the LDA maximiza- 
tion problem is equivalent to solving the following eigenvalue—eigenvector decomposi- 
tion problem: 


AVv = By, (3.24) 
where 4 and v are the resulting eigenvalues and eigenvectors, respectively. 


The eigenvectors v can be expressed as a linear combination of elements ex- 


pressed in the transformed space as: 


N 
v= > >a, (%p_)> (3.25) 
p= q=l 
where a are defined as the coefficients of the linear combination. Note that 


Pq ? p=l,...Nsg=l,...N > 


the coefficient vector a =(a,,) can be expressed in a more compact form as 


p=l,..N3q=1,...n, 


a=(4,),1,.v> Where a, =(a is defined as the coefficient of the vector v in class 


Pq rene 
Pp. 


Baudat showed that the Rayleigh quotient expressed in the transformed space, 


may be rewritten as [Baudat, 2000]: 





7 v’ By 7 a’ KWKa 


aA : 
v Vv a' KKa 


(3.26) 


Therefore, maximizing the above Rayleigh quotient can be executed by solving the fol- 
lowing generalized eigenproblem: 


KWKa=AKKa. (3.27) 


Note that the matrix K is symmetric and positive definite when the kernel type se- 


lected satisfies Mercer theorem [Scholkpof, 1998]. Therefore, K may be expressed as: 
K=PYP’, (3.28) 
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where P and /’are the eigenvector and nonsingular diagonal eigenvalue matrices, respec- 
tively. In addition, P has full rank when K is positive definite and the eigenvector matrix 


P is unitary, ie., PP’ =P’ P=I. 


At this point, replacing the matrix K by its eigendecomposition in Eq. (3.27) leads to: 
(PUP’ )W(PTP’ )a = A(PTP’ PVP" Ja. (3.29) 


Recall that the eigenvector matrix P is unitary. Thus, Eq. (3.29) becomes: 


(PUP’ )W(PTP’ )a = A(PT)\(VP’ Ja. (3.30) 


Using the fact that the eigenvalue matrix P and [have full rank (1.e., are invertible matri- 
ces) when the kernel type is selected to satisfy Mercer theorem, Eq. (3.30) may be simpli- 
fied to: 

(P’)W(PUP' )a = ATP’ Ja. (3.31) 


In order to simplify the following calculations, a new variable / is defined as: 


B=TP"a. (3.32) 


Then, replacing Eq. (3.32) into Eq. (3.31) leads to: 
(P' WPB=AB. (3.33) 
Therefore, maximizing the Rayleigh quotient shown in Eq. (3.26) above may be 
obtained by first, solving the eigenproblem derived in Eq. (3.33) in order to obtain # ; and 
second, by computing the coefficient vector a from f, as a= PI'£. Note that the ac- 


tual a coefficients are not unique, as only the direction of the eigenvectors obtained by 
solving an eigenproblem is unique. The a coefficients can be computed by setting the 
vectors v to norm |. Thus, taking into consideration Eq. (3.26), the following expression 
is obtained: 


vv=l (3.34) 


YY LP" (Xp) P(X) = 1 (3.35) 
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Therefore, 


N WN 
vv=>> @, K,a;=1, (3.36) 


which yields: 
vv=a'Ka=l, (3.37) 


where the dimension of K,, is equal to (n »X n,).as the dimensions of @, and @, are 


(n x 1) and (n, x1),respectively. Thus, the coefficients a need to be divided by Va'Ka,, 
in order to normalize the vectors v to norm 1. 


Finally, testing data z may be projected into the nonlinear feature space by pro- 


jecting them as: 


Fao ye. (3.38) 


p=! q=l 
3. Kernel Selection 
Recall that the basic idea of the main kernel—based schemes is to project the data 


into a feature space via a nonlinear projection g and to apply linear schemes in that new 


space. The kernel function required for this implementation is expressed in terms of dot 
products as: 


K(x, y) = p(x) PY). (3.39) 


Currently, several types of kernels are being used in kernel—based implementa- 
tions. The three most significant are: 


° The polynomial kernel of the form: k(x, y)=(x-y)*, where d>1 has 
been used extensively. Note that the GDA implementation reverts back to 
the classical LDA implementation when d =1.In addition, [Liu, 2004] re- 
cently investigated polynomial kernel functions with fractional powers 
(i.e., when 0 <d <1), which can be applied to face recognition applica- 
tions, even though such studies may not necessarily satisfy the Mercer 
theorem, and may not be considered as valid kernel functions. Neverthe- 
less, Liu showed in his simulations, (when using 30 features or more on 
visible imaging data), that fractional degree polynomial kernels lead to 
higher recognition rates than do those obtained with integer—degree poly- 
nomial kernels [Liu, 2004]. 
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° The Gaussian kernel function defined as: 


bf 


K(x,y)=e 9° , (3.40) 





where the parameter, o , has to be selected arbitrarily. 
° The sigmoid kernel defined as: 


k(x, y) = tanh(k(xy) + 9), (3.41) 


where 0<k and <0. The sigmoid kernel function has been successfully used in prac- 
tice, (especially in SVM applications), even though it does not actually define a positive 
semi—definite Gram matrix and thus, is not a valid kernel function [Liu, 2004]. 

4. Application to Iris Data 

The GDA scheme is first applied to a classical dataset commonly used in classifi- 
cation applications for benchmarking purposes, namely the “Iris” dataset [Baudat, 2000]. 
The specific Iris data consists of four real parameters of the flower Iris divided into three 
classes, each with 50 elements of dimension four. The Iris data consists of: the sepal 
length, the sepal width, the petal length, and the petal width, with all measurements ex- 
pressed in millimeters. The three classes represent three different types of Iris flowers: 
Iris setosa, Iris versicolor, and Iris virginica. The Iris dataset is interesting, as one class is 
linearly separable from the other two, whereas the other two are not linearly separable 
from each other. As a result, linear classifiers are unable to separate the three classes very 


well. Next, we will show that the GDA is able to separate all three classes well. 


Using the regular LDA leads to two overlapping projected classes, with only one 
being separable from the other two. Such overlapping is shown in Figs. 3.4 and 3.5, 
which plot the Iris data 1—and 2—dimensional projections in the LDA-derived directions 


[Baudat, 2000]. 
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1—Dimensional LDA projection of Iris data. Classes 2 and 3 are not sepa- 


Figure 3.4. 


rable. 
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2-D Projected Training Iris Data using LDA 


© class 1 
* class 2 
& class3 


= 
* 
@1-- 





Figure 3.5... 2—Dimensional LDA projection of Iris data. Classes 2 and 3 are not sepa- 
rable. 


Next, Figs. 3.6, 3.7, and 3.8 plot the 2—dimensional class projections obtained us- 
ing the GDA with a Gaussian kernel, [Baudat, 2000], where different values of the arbi- 
trary spread parameter o were selected (o =. 3,007, 7) .The plots were produced using 
the software available from [Baudat, October 2000]. Results demonstrate that all three 
projected classes are widely separated from the other two. It should be noted that the pro- 
jected data spread increases with increasing values of the spread parameter, which may 


result in nonlinearly separable projected data, as shown in Fig. 3.8. 


30 


© class 1 
* class 2 
& class3 


' 
' 
' 
' 
1 
' 
' 
1 
1 
' 
' 
' 
' 
' 
7 
1 
' 
' 
' 
' 
' 
' 
1 
' 
4 
1 
' 
' 
' 
' 
' 
' 
' 
' 
d 
' 
' 
' 
1 
' 
' 
' 
' 
' 
' 
” 
' 
' 
' 
' 
' 
' 
‘ 
' 
' 


See eee eee eee errs 


eee eee 


were een p eee e eee 


sieieatataiatetatatial deateieietatetetetets teletetetetetetated 


Sela ietatatals tolatatetatatatatatet 


lta tals Relat iaiatatal 


ee ep ee gr rte prec rreee 


eee ee ee ee ee ee eee ee ey 


% 





GDA projection of Iris data. All classes are separable, but the element vec- 


Figure 3.6. 
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GDA projection of Iris data. All classes are separable (o 


Figure 3.7. 
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2-D Projected Training Iris Data using GDA- Gaussian Kernel-variance=7 
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Figure 3.8. | GDA projection of Iris data. The two classes are not entirely separable 
(o = 7 Large variation.). 


C. CONCLUSIONS 
This chapter presented the basic concepts behind kernel—based algorithms and 
their applications to classification problems. It also described the GDA implementation, 


which constitutes the main research tool of this study. 


The following chapter discusses the application of the GDA to the IR face data- 


base. 
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IV. GDA RESULTS 


This chapter considers the application of the GDA approach to the IR face data- 
base. First, it investigates how the specific type of distance metrics selected in the algo- 
rithm may affect classification performances. Second, the concepts of rank score and con- 
fidence interval are reviewed and it is shown how they can be applied toward analyzing 


and comparing classifier performances. 


Next, we apply the above distance measures to the GDA implementation consid- 
ered in this study. Specifically, we investigate the link between distance metrics and re- 
sulting classification performances obtained using the IR database. We also investigate 
the links between kernel function type, and number of eigenvectors selected in the GDA 
projection operation and the resulting classifier performances. 

A. THEORETICAL BACKGROUND 

1. Introduction 

First, this section considers various types of distances commonly used in pattern 
classification applications. Second, it reviews the concepts of rank score, confidence in- 
terval, and confidence score. These parameters are used to evaluate how specific dis- 
tances, kernel types, and number of eigenvectors extracted from the kernel matrix K in- 
fluence the classifier performances. 

2. Distances Considered 

According to [Duda, 2001], pattern classification is a one—to—one assignment data 
to a class belonging to a set of pre-defined element classes. This assignment is based on 
the concept of nearest neighbor classifier, which is in turn based on the minimization of a 


metric or distance function between testing data and each class centroid. 


Distances are measures which evaluate the difference between two vectors v and 
w. Many types of distances have been commonly used in engineering applications, such 
as the Euclidian Norm, Norm 1, etc. Some distances are known to perform better than 


others depending on the data and the specific pattern classification algorithm considered. 


of 


The following section investigates five different distance measures commonly used in 
pattern classification applications which are considered in this study. We will select the 


distance leading to best overall classification performances in the rest of the study. 


The following five distances are considered, where it is assumed that the vectors v 


and w are real-valued and of dimension (N x1) [Socolinsky, 2003]: 


1. Euclidean Norm 2: 
N 2 
PO,w)=)%-,) » (4.1) 
i=] 


M(v,w) =(v-w)' = (v-w), (4.2) 


2. Mahalanobis: 


where » is the data covariance matrix defined as [Bishop, 1995]: 
E=E|(v—w)(v-w)' |, (4.3) 
3. Mahalanobis Norm 1: 


Ne A 
M'(v,w)= oa,7 ly, -wh (4.4) 


i=1 
where o, is an estimate of the data variance along the i coordinate direction, and V,,W, 
are the i” vector coordinates of v and w respectively, 


4. Mahalanobis Norm 2: 





wa= fSoro,-m) : (4.5) 


a: Mahalanobis Angular: 


N 
ay oO, vw, 
M“* (v, w) = arccos —=+_—_—_—_-. 4.6) 
: L’(0,v)L’ (0, w) : 
3. Rank Score 
The notion of rank score (also known as cumulative matching score) is based on 


the probability of guessing the data class assignment correctly [Philips, 1996]. As a re- 
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sult, a rank score equal to one (referred to as rank—1 score) is defined as the probability of 
making a correct class assignment. Accordingly, a rank score equal to 2 (i.e., rank—2 
score) corresponds to the probability of making a correct class assignment in the top (i.e., 
most likely) two decisions, and rank-N score refers to as the probability of making a cor- 
rect assignment within the top N candidates. Extending this concept, a rank—C score de- 
fined on a C-class data is equal to 1, as all potential classes are included. In addition, a 
classifier with a 100% classification accuracy has a rank score plot with rank—1 score 
equal to 1. Therefore, the rank score value, expressed as a function of the rank index, is a 
monotonically increasing function in [0, 1], and is commonly used in evaluating classifier 


performances [Philips, 1996]. 


Assume that Fig. 4.1 represents the rank score obtained for some classifier. This 
example shows that the probability of making a correct class decision (obtained for rank— 
1 score) is equal to 0.8. Next, the probability of making a correct decision in the first or 
second top decisions (represented with the rank—2 score quantity) is equal to 0.84. The 
example also shows that all correct guesses are contained in the first top five guesses, as 
the rank—5 score is equal to 1. In the following sections, references to specific classifica- 
tion performance measurements will refer to rank—1 score values, unless specified other- 


wise. 
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Cumulative Rank Score 





Rank Score Index 


Figure 4.1. | Rank score example. 


4. Confidence Interval Definitions 

Recall that the sample mean computed from finite data only approximates the 
(true) population mean, with accuracy depending on the data size [Jain, 1991]. A useful 
statistical measure in addition to the sample mean is the associated confidence interval, 
which provides the user with a measure of reliability of the observed mean. Ideally, a 
confidence interval width should be as small as possible, because the true mean lies 
within the confidence interval with some predefined probability and thus it is easier to 


estimate its true value. 

Next, the concept of confidence interval (CI) is briefly reviewed. Assume that 
there are n observations Eeane obtained for one trial of the whole population of 
observations (i.e., measurements) [Jain, 1991]. Further, assume that the population has 


true mean yu, standard deviation o, and sample mean //. Note that the true mean yw is a 
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deterministic number, whereas the sample mean # is a random number. As a result, 


sample means computed from different data trials will be close to the true mean, but will 


most likely be different from each other. 
The fact that the sample size is finite means that /# is just an estimate of the true 
mean ju. Thus, it is quite useful to estimate the corresponding confidence interval [b,., | 


defined around y, as this information gives the user a measure of confidence in the esti- 


mated true mean. The values b,,b, are called the CI lower and upper bounds, respec- 
tively. The measure of confidence is defined as 1—a@, where @ is called the significance 
level, and 100- (1 — a) is called the confidence level. This definition means that there is 
100- (1 7 a) % probability that the true mean, computed from multiple trials, lies in the 
derived confidence interval [b,.2,]. However, one trial can be selected to estimate the 


confidence interval associated with the sample data mean “, provided the sample size n 


is large enough, as a result of the central limit theorem. In such a case, @ can be shown 


; : : 432 o 
to be approximately Gaussian with mean yw and standard deviation ——, where n and o 


Ta 


are the sample size and standard deviation of the data sequence, respectively. 


It can be shown that the data mean confidence interval can be expressed generally 


as: 


(4.7) 


where i : is obtained from the Student’s t-distribution, where m—1represents the 
1-—;n— 


number of degrees of freedom and n is the sample size. However, for large sample sizes, 
the t—distribution values can be approximated by the normal distribution values. There- 
fore, given that the sample size is 1000 for this study, the CI is computed using the nor- 


mal distribution as: 


[b,.0,)=|#-z ,=.H+z “Tn j (4.8) 


In this study we select a 95% confidence interval, 1.e., @ =0.05, which leads to a 


value of z , =Z oo; =1.960, [Jain, 1991]. As a result, the corresponding 95% confi- 
I- 1-—— 


2 2 
dence interval is defined as: 


Oo Oo 
b,b,|=| 4-1.960—, 7 +1.960— }, (4.9) 
[ 1 | L ay HL S| 


where o and “, and n are the standard deviation, the estimated mean, and the sample size 


of the measured data x. The value of z , for the Gaussian distribution can be found in 
i=. 
2 


look-up tables contained in statistical textbooks and others [Jain, 1991]. 


Confidence 
Interval 





Figure 4.2. Mean confidence interval. 


5. Confidence Score 
Confidence scores are scalar quantities defined in the range [0, 1] which provide 
an evaluation of the confidence in the class decision. As a result, the confidence score 


u,, of image j represents the likelihood that it belongs to class 7. The confidence score 


value is defined for a C—class problem as: 


i£-<—<$<<— (4.10) 


where d,, represents the distance between image j and the i " class centroid, and m is a 


weighting exponent which can take values between [1,0] [Jang, 1997]. We selected 


m=2 in this study. Various distance types may be selected in the confidence score defi- 
nition. In this study we selected the Mahalanobis Angular distance as it closely approxi- 
mates classification performances obtained with the Mahalanobis distance at a reduced 
computational cost. Further details regarding the distance metric selection are discussed 
in Chapter IV.B.4. 

6. Kernel Matrix K Eigen—Decomposition Issues 

An important parameter in this study is the number of top eigenvectors of the ker- 
nel matrix used in the derivation of the GDA projection vectors. Recall that eigenvectors 
associated with “negligible” eigenvalues may be discarded in the computation of the 
GDA projection vectors v ’s, as defined in Chapter III.B.2, since they are expected to 
have little impact on the resulting projected data. Further, recall that the kernel matrix 


size is directly related to the number of training data samples (900), resulting in a 


(900x 900) matrix for our study. Figure 4.3 plots the eigenvalues as a function of the 


eigenvalue index obtained from the eigen—decomposition of the kernel matrix K, for one 
typical iteration in the database. Note that eigenvalue magnitudes decrease quite rapidly, 
and that those corresponding to index 500 or higher do not seem to contribute signifi- 
cantly to the kernel matrix structure. Further, note that truncating the eigen— 


decomposition can result in significant computational savings. 


Therefore, we investigated whether truncating the kernel matrix K eigen— 
decomposition specifically affected the resulting classification performances with the fol- 
lowing experiments: 


e Keep all eigenvectors assigned to eigenvalues with value greater than or 
equal to the maximum eigenvalue divided by 1000 (unless specified oth- 
erwise, this is the referred to as the “default case” as was proposed by 
Baudat [Baudat, October 2000]. For practical purposes, simulations have 
shown that this choice comes down to keeping about the top first 700 ei- 
genvectors, 1.e., the eigenvectors associated with the top 700 eigenvalues 
in magnitude). 


e Keep the top 100 eigenvectors. 
® Keep the top 150 eigenvectors. 
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e Keep the top 200 eigenvectors. 
° Keep the top 250 eigenvectors. 
° Keep the top 500 eigenvectors. 
The results are presented in Chapter IV.B. 
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Figure 4.3. Eigenvalues of the kernel matrix K. 
ye Kernel Function Types Considered 


Various types of polynomials may be used in kernel—based schemes, as men- 
tioned earlier. First, the Gaussian kernel was selected, as it has been used with success in 
numerous applications. Next, polynomials of different degrees were considered such as: 


° Polynomial of degree 1: k(x, y) = (x- y), which is equivalent to the LDA 
implementation [Baudat, 2000], 


e Polynomial of degree 2: k(x, y) =(x- y)’, 


* Polynomial of degree 3: k(x, y) =(x-y)’. 
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Results showed the Gaussian kernel led to similar classification performances as 
that obtained for the polynomial of degree 2, for the selected spread value o = 7000. In 
addition, it was noted that selecting the polynomial kernel allowed us to efficiently vec- 
torize the Matlab code, initially written by Baudat and Anouar [Baudat, October 2000], 
and significantly reduce the associated computational load. Details regarding the vectori- 
zation operations are presented in Appendix B. Specifically, it was noted that the Matlab 
implementation with Gaussian kernel took about 7.5 to 8 minutes per iteration, while the 
vectorized Matlab implementation with polynomial kernel of degree 2 took about 75 sec- 
onds per iteration in the database, using a 3—GHz PC. 

B. EXPERIMENT DESCRIPTION 

1. Data Selection Procedures 

The database considered in this study is that previously used in [Lee, 2004], and 
specific details regarding the collection process and data manipulation can be found 
within. The database includes 50 classes (adult subjects). Each subject was asked to gaze 
at one of ten points on the wall behind the camera to introduce angle and tilt variations in 
the subject pose. Each class has the following three sections: 

1. Photographed in a neutral pose. 

2: Photographed pronouncing the vowel “u”. 

3: Photographed while smiling. 

As a result, the complete database consists of (50x10x3) = 1500 images. All IR 


images were then cropped to retain only the middle section of the face, which excluded 
the chin, some of the forehead, and the ears, resulting in cropped images of dimension 


equal to(45 x 60) pixels. Next, cropped images were reshaped as column vectors of di- 


mension (2700x1), resulting in an overall database matrix of size (2700x1500). 


The data were split into nonoverlapping training and testing sets, and 40% and 
60% of the data per class were selected for testing and training sets, respectively. We im- 
plemented a cross—validation variant to estimate classification performances based on 
resampling and averaged over multiple repetitions of each experiment (referred to as an 
iteration in this study). Therefore, for each iteration, images in each section and class are 


randomly permuted, and the first four resulting images (40%) of each section and class 
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were selected to create the Testing Image Data Matrix (TEIDM), while the last six im- 
ages (60%) were selected to form the Training Image Data Matrix (TRIDM). As a result, 
the TEIDM and TRIDM have dimensions equal to(2700x 600) and (2700x900), respec- 
tively. 

2. Number of Iterations per Each Individual Experiment 

A significant body of research dealing with small sample size experiments has 
been conducted in the research community, and most studies consider some type of aver- 
aging operations to reduce the impact of the data split between training and testing sets. 
In this study, we implemented a cross—validation variant based on resampling. We con- 
sidered 500 and 1000 iterations (i.e., repetitions of each experiment) and selected 1000 
iterations to collect sufficient reliable statistical data regarding classification perform- 
ances. 

3. Software Implementation 

Four main categories of experiments were conducted in this study: 


1. Implementation of the Fisherface algorithm, using the GDA with polyno- 
mial kernel of degree 1, to benchmark results against those obtained ear- 
lier by Lee [Lee, 2004], 


2. Selection of the best distance metric to be applied in the classifier. This se- 
lection was conducted using 3—dimensional distance map plots, which are 
discussed later in Chapter IV.B.4. 


a: Computation of rank scores and overall classification performance per 
class for 40/60 cross—validation variant and 1000 iterations. 


4. Computation of 95% confidence intervals for rank—1 to rank—5 scores 
mean classification results. 


All the software code developed for these simulations is included in Appendix B. 
Note that the software implementation used in this study was either written by the author 
or modified from that developed earlier by Baudat [Baudat, October 2000]. The main 


steps associated with a given iteration are listed in Table 4.1. 
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1. Execute one random permutation of the images within each section 
and class, as discussed in Chapter IV.B.1 above. 


2: Generate TRIDM and TEIDM matrices, as described in Chapter 
IV.B.1. 


Data Selection 





3: Apply the GDA algorithm to the TRIDM to create the kernel matrix 
K. 


GDA steps Compute the eigen—decomposition of the kernel matrix K. 








4. 

5. Project TRIDM and TEIDM matrices onto the kernel matrix K. 

6 Compute the class centroids from the projected TRIDM information. 
7 


Compute the projected training images covariance matrix (needed for 

the Mahalanobis distance considered in the study). 

Classification | 8. Compute the distance between projected training data and class cen- 
steps troids and between projected testing data and class centroids. 

9. For each distance considered and projected training and testing data 


sample, make class decision by selecting as class that which leads to 
the smallest distance between projected data and class centroids. 








Table 4.1. Main software steps required for a given iteration. 


4. Distance and Eigenvector Impacts on Classification Results 

Recall that the classifier scheme analyzed in this study belongs to the family of 
distance classifiers. Thus, the specific choice of distance may be an important parameter 
in the resulting algorithm performance. As a result, we investigated how the selection of 
the specific distance influenced resulting classifier performances. In addition, the specific 
number of eigenvectors selected in the GDA kernel matrix K is also an important user— 
specified parameter. Therefore, we also investigated the impact the number of eigenvec- 
tors extracted from the kernel matrix K has on resulting classifier performances. To visu- 
alize the impact these two parameters have on the classifier performances, we generated 
3—dimensional training and testing distance maps, which plot the distances between each 
training (testing) image and all training data centroids, respectively for the five distance 
types discussed earlier in Chapter IV.A.2, while varying the number of top eigenvectors 


in the kernel matrix K. 


Figure 4.4 illustrates the Matlab color code variations used in Figs. 4.5 to 4.14., 
which investigate the classifier performance as a function of the distance used. Table 4.2 
lists all experiments conducted according to the various types of distances used, the type 


of images selected (testing/training), and the number of kernel matrix K eigenvectors se- 
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lected for the projections. These simulations were repeated ten times and Figs. 4.5 to 4.14 


represent a typical result, (as significant consistency was observed between trials). 


Colormap Editor 
File Edit Tools Help 


Current color info 


Index: 
CData: 


Vv 


Color data min: |0.0 
Interpolating colorspace:|RGB ¥ 
Color data max: |1.0 


¥ Immediate apply OK Cancel Apply Help 


Figure 4.4. Matlab color code scaling. 












































Type of Distance used | Training im- Training Testing im- | Testing im- 
ages, Default | images, 250 | ages, Default ages, 250 

Norm—2 x xX x xX 
Mahalanobis x x x x 
Mahalanobis Norm-2 x x x Xx 
Mahalanobis Norm-1 Xx x x x 
Mahalanobis Angular x x x xX 
Table 4.2. Distance maps computed. “Default” or “250” refers to the number of ker- 


nel matrix eigenvectors kept in the GDA algorithm step. 


Each figure corresponds to a specific distance and number of kernel matrix K ei- 
genvectors used in the GDA projection step, and is subdivided into four sections. The two 
sub-—figures in the top row represent the 3—dimensional distance maps obtained for train- 


ing images (left) and testing images (right), respectively. The bottom row represents the 
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corresponding contour plots. Distance maps generated are of dimension (XY x Y), where XY 
Pp g p ps g 


represents the number of training (equal to 900), or testing (equal to 600) images on the x 
axis, and Y (equal to 50) represents the number of classes on the y axis. Finally, the verti- 
cal z axis represents the measured distance between each sample and all training data 


class centroids. 


The distance maps are set up so that diagonal values represent the distance be- 
tween each image and its own class centroid, while off-diagonal elements represent dis- 
tances between images and centroids from other classes. As a result, one would expect to 
see small distances (i.e., blue colors) on the diagonal elements and larger ones (i.e., to- 
wards red colors) on the off-diagonal elements to represent good classification behavior. 
The “perfect” classifier should exhibit a distance map showing uniform deep blue diago- 
nal elements and uniform red off—diagonal elements. As a result, these distance maps 
make it possible to evaluate the resulting classifier performances visually by noting: 


e How uniformly deep blue diagonal elements are (deep blue color on the 
main diagonal insures the smallest distances are observed between images 
and centroids from their own class). 


e How red off-diagonal elements are (red on the off-diagonal elements in- 
sures larger distances between images and other class centroids are ob- 
served). 

e How uniformly red the off-diagonal elements are (uniformly red color on 


the off-diagonal elements indicate low variance in the results). 

In addition, results show that classification performances increase with the num- 
ber of kernel matrix eigenvectors selected in the GDA computation, except for one spe- 
cific case discussed later. Results also show that better performances are observed when 
dealing with training images than for testing images, as expected because the centroids 


are derived from the training data itself. 
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Figure 4.5. 


3—Dimensional plots—contours using Norm—2 distance and default number 


of eigenvectors. 
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Figure 4.6. | 3—Dimensional plots—contours using Mahalanobis distance and default 
number of eigenvectors. 
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Figure 4.7. 


3—Dimensional plots—contours using Mahalanobis Norm—2 distance and 


default number of eigenvectors. 
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Figure 4.8. | 3—Dimensional plots—contours using Mahalanobis Norm-—1 distance and 


default number of eigenvectors. 
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default number of eigenvectors. 
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3—Dimensional plots—contours using Norm—2 distance and 250 eigenvec- 
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Figure 4.11. 
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3—Dimensional plots—contours using Mahalanobis distance and 250 eigen- 
vectors. 
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Figure 4.12. 3—Dimensional plots—contours using Mahalanobis Norm—2 distance and 
250 eigenvectors. 
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Figure 4.13. 


3—Dimensional plots—contours using Mahalanobis Norm-—1 distance and 


250 eigenvectors. 
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3s Rank Score and Overall Class Performance 

Cumulative Rank Scores (CRS) and average classification rates are computed to 
investigate the impact that the various distances, polynomial degree orders and numbers 
of eigenvectors from the kernel matrix K selected in the GDA step have on overall classi- 
fier performances. All results provided are obtained with the 40/60 cross—validation vari- 
ant discussed earlier in Section B.1 and 1000 repetitions (1.e., iterations) to insure results 
are Statistically meaningful. Software implementations computing rank scores and aver- 


age classification rates are provided in Appendix B. 


Figures 4.15 and 4.16 show the rank score and classification performance on a per 
class basis, obtained for the testing images dataset, for one representative iteration. Addi- 


tional random iteration rank score plots are included in Appendix C. 


Cummulative Rank Score for 600 Testing Images 
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Figure 4.15. Testing data CRS plot; Mahalanobis Angular distance, default number of 
eigenvectors selected in the GDA step, one iteration. 
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Figure 4.16. Testing data average classification rate; Mahalanobis Angular distance, 
default number of eigenvectors selected in the GDA step, one iteration. 


Table 4.3 presents average classification rates and associated 95% confidence in- 


tervals obtained for all scenarios investigated in the study. 


The following comments can be made: 


e Results obtained for the Fisherface implemented as PCA and GDA with 
polynomial kernel of degree | and default number of eigenvectors selected 
in the GDA step are similar to those obtained earlier by Lee [Lee, 2004]. It 
is also noted that the Euclidian distance leads to better classification re- 
sults (94.59%) than the Mahalanobis Angular distance does (93.70%). 
Therefore, there is no advantage when using the Fisherface in selecting the 
Mahalanobis Angular distance. 
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Results show that when selecting a polynomial kernel of order 2 in the 
GDA step and keeping eigenvectors as specified in Baudat [Baudat, Octo- 
ber 2000] (shown in row 3 of Table 4.3), the best average classification 
rates are obtained with the Mahalanobis distance (98.44%), closely fol- 
lowed by the Mahalanobis Angular distance (98.41%). Recall that the Ma- 
halanobis Angular distance is less computationally intensive as it does not 
require computing the data covariance matrix inverse. As a result, the Ma- 
halanobis Angular distance was selected for all follow—on experiments, 
and all other results are based on this metric, unless specified otherwise. 


Figures 4.11 and 4.14 show that using only 250 kernel matrix K eigenvec- 
tors and the Mahalanobis Angular distance appears to lead to better sepa- 
ration between intra—class and inter—class distance values, and more con- 
sistent inter—class distance values, than observed with the Mahalanobis 
distance. 


Results show that reducing the number of kernel matrix K eigenvectors se- 
lected in the GDA step usually results in classification performance degra- 
dations. Specifically, the investigation was conducted by selecting the top 
100, 150, 200, 250, 500, and the default number, as defined by Baudat, for 
a given distance metric. Results show that classification performances de- 
grade as the number of eigenvectors decreases, with the exception of the 
slightly better performance observed with 500 eigenvectors (98.55%) than 
with the default number (around 700) (98.41%), which it is not viewed as 
statistically significant. Overall, eigenvectors associated with very small 
eigenvalues are expected to have small impact on the resulting classifica- 
tion performances. 


Additionally the GDA with polynomial kernel of degree 1 (1.e., LDA) was 
investigated directly and resulted in lower classification performances than 
those obtained with the Fisherface implementation. 


Results show a very slight increase in classification performances by in- 
creasing the polynomial degree to 3 (98.45%) from 2 (98.41%), when se- 
lecting the default number of kernel matrix K eigenvectors. 
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Overall Classification rates: Mean and 95% confidence interval 


















































: DISTANCE SELECTED 
Algorithm 
selected 
(polynomial deg. 
order, number of Nori? Mahalanobis Mahalanobis | Mahalanobis Mahalanobis 
eigenvectors se- Norm -2 Norm-1 Angular 
lected in the 
GDA step) 
GDA 0.9273 0.9282 
(1, default) [0.9266, 0.9280] [0.9275, 0.9289] 
ee. Prats 0.9370 
0.9453, 0.9466 0.9263, 0.9376 
(1, default) _| | 
GDA 0.9826 0.9844 0.9826 0.9799 0.9841 
(2, default) [0.9822, 0.9831] | [0.9840, 0.9848] | [0.9822, 0.9831] | [0.9795, 0.9804] [0.9837, 0.9845] 
GDA 0.9855 
(2, 500) (0.9851, 0.9859] 
GDA 0.9774 
(2, 250) [0.9769, 0.9778] 
GDA 0.9670 
(2, 200) (0.9665, 0.9676] 
GDA 0.9448 
(2, 150) (0.9442, 0.9455] 
GDA 0.8944 
(2, 100) (0.8935, 0.8953] 
GDA 0.9845 
(3, default) [0.9841, 0.9849] 








Table 4.3. Classification rates for testing data; 40%/60%-cross—validation variant, 
1000 iterations; means and 95% confidence intervals as a function of the kernel function 
polynomial degree, distance type, and number of eigenvectors selected for the GDA step. 
6. Mean Performance Confidence Intervals 
Figure 4.17 plots the 95% confidence intervals obtained for rank—1 to rank—5 
scores mean classification rates with the five distances investigated in our study and ker- 
nel function polynomial of degree 2. Results show that best rank—1 to rank —5 scores are 
consistently obtained with the Mahalanobis distance (as noted earlier for rank—1 scores in 
Table 4.3). Results also show that the Mahalanobis Angular distance rank—1 to —5 scores 
are consistently very close to the Mahalanobis scores, which further validates the selec- 
tion of the Mahalanobis Angular distance as the metric of choice in this classifier imple- 
mentation. The Mahalanobis Angular distance has better performance with fewer eigen- 


vectors selected than the Mahalanobis distance. 
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Mean Confidence Intervals of first 5 Rank Scores for Testing Data 


0.995 


0.99 


Rank Score Mean Confidence Interv al 


0.98 





0.975 
Rank Score Index 


Figure 4.17. 95% Confidence interval of the means of ranks 1, 2, 3, 4, and 5 of the five 
distances listed on the legend. 
a Confidence Score 
The confidence score provides the user with a measure to evaluate the confidence 
level with which each image belongs to a class. This parameter is bound between 0 and 1 
and is useful when evaluating the classifier decision confidence, and/or when fusing vari- 
ous classifier decisions on the same data. Figures 4.18 and 4.19 plot representative 3- 


dimensional confidence score maps obtained for one iteration. The dimension of the ma- 


trix plotted is again equal to (Xx xY ) , where X corresponds to the number of testing im- 


ages (600), and Y represents the number of classes (equal to 50 in our study). Note that a 
perfect classifier should lead to a confidence score map with diagonal elements equal to | 
and off-diagonal elements equal to 0. Figures 4.18 (3—dimensional plot) and 4.19 (asso- 
ciated contour) show a confidence score map that has diagonal elements significantly 
higher than off-diagonal elements with average values equal to 0.5 and 0.02, respec- 
tively. Thus, the results show good average performances on that specific iteration. The 


software implementation for this computation is included in Appendix B. 
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Figure 4.18. Testing set 3—dimensional confidence score map for one representative 
iteration. 
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Figure 4.19. Testing set confidence score map for one representative iteration. 


CONCLUSIONS 

This chapter presented the various parameters used to evaluate the GDA-based 
classification schemes considered in this study. We defined the concept of distance maps 
and showed how they can be used to visually evaluate classifier performances. We also 
discussed the concept of average classification rates and 95% confidence intervals, and 
confidence scores. The concepts of rank scores were reviewed and applied to compare the 
classifiers investigated in this study. We considered five different types of distances and 
investigated the impact that the specific number of eigenvectors extracted from the GDA 
kernel matrix K and selected in the GDA projection step has on resulting classifier per- 
formances. Results showed that the optimum performance was recorded for the Maha- 
lanobis distance, used with polynomial kernel function of degree 2, and by selecting the 


top 500 eigenvectors of the kernel matrix K eigen—decomposition. Results also show that 
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the classifier set-up with the Mahalanobis Angular distance has performances closely 
approximating those obtained with the Mahalanobis distance, at a much reduced compu- 
tational cost. Therefore, the Mahalanobis Angular distance was selected as the metric of 


choice in this study. 
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Vv. CONCLUSIONS 


This study extended earlier work by Pereira [Pereira, 2002] and Lee [Lee, 2004] 
who investigated the recognition of infrared face images collected using a low resolution 
uncooled IR camera under controlled indoor conditions. This study considered a nonlin- 
ear kernel—based classifier, the Generalized Discriminant Analysis (GDA) approach pro- 
posed earlier by Baudat [Baudat, 2000]. First, two basic linear classification schemes 
were reviewed, the PCA and the Fisherface approach, which have been widely used in 
classification applications. Next, the basic idea behind the GDA was presented. It was 
shown that the GDA can be viewed as a two-step process. First, the data is implicitly 
nonlinearly projected in a high-dimensional feature space where discrimination becomes 
easier to be performed. Second, the LDA is implicitly applied to the transformed data. 
Also the “kernel trick” concept was discussed, which is the fundamental concept behind 
the GDA. Finally, we proposed a vectorization step to the GDA software implementation 


proposed by Baudat, which allowed us to speed up computations by a factor of six. 


We investigated five different types of distances and selected the best suited for 
the database under consideration via the evaluation of 3—dimensional distance maps. The 
concepts of distance classifiers, rank score, confidence interval and confidence score 
were reviewed as performance measurement tools. A cross—validation variant was im- 
plemented to insure that the obtained results were statistically significant. Results showed 
that the GDA approach leads to better classification performances (98.55%) than those 
obtained with the linear classifiers considered in the two earlier studies, and increases the 
average classification performance by 5% over the Fisherface approach. Results also 
showed that a combination of the Mahalanobis Angular distance, a polynomial kernel of 
degree 2, and the top 500 eigenvectors lead to the best average classification perform- 


ances for the GDA implementation for the database under consideration. 


In conclusion, this study has shown that a low-cost, low-resolution IR camera 
combined with an efficient classifier can play an effective role in security related applica- 


tions. 


67 


THIS PAGE INTENTIONALLY LEFT BLANK 


68 


APPENDIX A. MATHEMATICAL BACKGROUND 


This appendix presents the main mathematical concepts used in the derivations of 
the PCA and Fisherface approaches discussed in Chapter II. First, the concept of 
Rayleigh quotient is reviewed. Next, linear algebra concepts dealing with eigenvalues, 
eigenvectors, and rank related issues are discussed. 

A. ANALYSIS 
if Rayleigh Quotient Analysis 
This section shows that computing the vector W, which maximizes the Rayleigh 


quotient to obtain the Fisher Discriminant vectors and given in Eq. (2.12), Chapter I.B.2: 


Wow = gma (A.1) 


opt 


IW'S,W | 
"IWS WI) 
is equivalent to solving the generalized eigenvalue problem shown in Eq. (2.13), Chapter 
II. B.2: 

S,w=AS\w. (A.2) 


Define the Rayleigh quotient expression J(w) as: 


IWS ,W | 


J(W) =. 
ce IW'S,,W | 


(A.3) 


The vector W which maximizes J (W) may be obtained by computing the first derivative 


of J (V) and making it equal to zero. The derivation closely follows that presented in 


[Gutierrez, 2004]: 








| re 
T 
qIW))_ |W syW 2523 
dw dw 
d(W'S,W] d(W'S_W] 
RSW eS | 0 A.4 
[W'S,W] a [Wes ,W] WW (A.4) 


=>[W'S,W]2S,W -[W'S,W]2S8,,W =0. 
Dividing the above equation by the scalar expression W’SwW leads to: 
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IW'S,W IWS WI 
W'S, Wl? |W'S_W\ 
=> S,W-JS,W =0 (A.5) 
=> SW =JS,W. 





SW = 


Note that the above equation is a generalized eigenvalue problem, which may be 
rewritten as: 


S,|5,W-JW =0, (A.6) 


which leads to: 


Si, S,W =JIW, (A.7) 


when S,, is anon singular matrix. Alternate derivations of the Fisher Discriminant vec- 
tors when the matrix S,, is singular have been presented by [Cheng, 1992], who pro- 
posed to replace the singular matrix by its rank decomposition. However, this specific 
approach does not appear to have been applied in the face recognition field and is not 


considered further in our study. 


Zz Computing the Number of Nonzero Eigenvalues of the Generalized 
Eigenvalue Eigenvector Decomposition Problem 


We show in this section that the following generalized eigen—problem, 
S,w=AS,,w (A.8) 
where S, and S,, are the Between—Class—Scatter (BCS) and Within—Class—Scatter 


(WCS) matrices, defined below, exhibits at most C —Inon-zero generalized eigenvalues, 


when S,, is not singular, where C is the number of the classes. 


Recall that: 
° The N—dimensional matrix S, is the Between—Class—Scatter—Matrix, is 
defined as: 


S, re! —m)(m,—m)* (A.9) 


I 
a 


where m is the mean of all the element vectors, n,and m,,are the number of element vec- 


tors, and the mean vector of the class i, respectively. 
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° The N—dimensional S,, is the Within—Class—Scatter Matrix, is defined as: 


Cc 
Se). 
i=1 


(x, -m,)(x,-m,)" (A.10) 


where m, and n, are the mean vector and the number of elements vectors of the class i, 


respectively. 


Note that, when SW is not singular, Eq. (A.8) may be rewritten as: 
5; S,W —JW =0, (A.11) 


which leads to 


S7S,W = JW. (A.12) 


Next, recall that the rank of a product of two matrices is less or equal to the minimum of 
the two matrix ranks [Strang, 2003], i.e., 
Rank(AB) < min(Rank(A),Rank(B)) . (A.13) 


As aresult, the rank of the matrix product S,'S, (Rank(S,,'S;,)) is less or equal 


to the smallest rank of the two matrices S and S;,. 


Further, recall that the rank of Sz is equal toC —1and the rank of S,, is equal to 
P-—C, where P is the total number of images, as will be discussed in Appendix A.3. Usu- 
ally in image classification applications, C—1<< P—C, as the number of classes is much 
smaller than the total number of images. Consequently, we have: 


Rank(S,S,) <min(C-1,P-C) =C-l, (A.14) 


in practical image classification applications, provided that S,, is not singular. Thus, the 


maximum number of nonzero eigenvectors extracted from Eq. (A.8) is equal to C—1. 


3. Computing the Maximum Rank of the Within—Class—Scatter Matrix 
Sw and of the Between—Class—Scatter Matrix Sz 


a. Maximum Rank of the Within—Class—Scatter Matrix 


In this section, we show that the N—dimensional within-class—scatter 


(WCS) matrix S,, has maximum rank equal to P—C, where P is the number of element 


vectors in the training set, and C is the number of classes. 
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According to Eqs. (2.9) and (2.10) from Chapter II.B.2, the N—dimensional 


WCS matrix Sy of the training data,{x,}_,_,, is defined as: 
Cc 
Sy => > (x, —m,)(x,-m,)’, (A.15) 


where n, and m, are defined as the number of training vectors and the mean vector for 
class i, respectively. Note that the number of elements per class may vary. As a result, the 


total number of element vectors is equal to P, Le., 


Sn, =P. (A.16) 


Our derivation is based on the following three matrix rank properties: 


1. The rank of the (N xN ) dimensional matrix xx’, where x is a column vec- 


tor of dimension N, is equal to 1 [Strang, 2003]. 


2: The maximum possible rank of the matrix S, defined as: 
P 
SS (A.17) 
i=1 
where {x;}_, _, are a set of column vectors, is equal to P. 
3. The N—dimensional covariance matrix defined from the set of element 
vectors, eee p- and given by: 
P 
Cov(X) = >) (x, —m)(x,-m)’, (A.18) 
i=] 


where the mean column vector m is defined as: 


1 P. 
Mm = — 
i=1 


x., A.19 
py ( ) 


has possible maximum rank equal to P —1, due to the fact that one degree of freedom is 

lost by the subtraction of the mean element vector in the covariance matrix definition. 
Note that the WCS matrix defined in Eq. (A.15) can be viewed as the 

summation of C class—specific covariance matrices, as defined in Eq. (A.18). Using prop- 


erties 2 and 3 listed above, each i" class—specific covariance matrix can be shown to have 
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a possible maximum rank equal to (n, —1). As a result, the overall WCS matrix, derived 


from the summation of these C class—specific covariance matrices has a maximum rank 
equal to P—C, because: 


SG -N=P-C. (A.20) 


i=l 
Note that the matrix S,, is N—dimensional. Thus, it is nonsingular only 


when (P—C)= N, [Strang, 2003]. Further, recall that P and N represent the number of 


images and the dimension of the reshaped images as column vectors, respectively. Usu- 
ally the number of images is smaller than the image dimensions, due to limitations in im- 


age collection, resulting in a singular S|, matrix. 


b. Maximum Rank of the Between—Class—Scatter Matrix 


Next, we show that the maximum rank of the Between—Class—Scatter— 


Matrix (BCS) S, is equal to C—1. 
Recall that the N—dimensional BCS matrix S, is defined as: 


n,(m,—m)(m,—m)' , (A.21) 


Me 


IL 
= 


S,= 


where m is the mean of all element vectors, n, is the number of element—vectors, and m; 
is the mean vector of class i, respectively. 

Note that the matrix S, is of the same type as the covariance matrix ex- 
pression defined in Eq. (A.18). Thus, it has a maximum possible rank equal to(C — 1) : 


due to the fact that one degree of freedom is lost by the subtraction of the mean element 


vector in the covariance matrix definition. 
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APPENDIX B. MATLAB SOURCE CODE 


This appendix lists all Matlab programs used in the study. These programs have 


been developed or borrowed from other references and modified according to the re- 


quirements of the current study. The Matlab programs-functions used, but not developed 


or modified within the current study, are only referenced without their Matlab code list- 


ing being provided. 


Below are listed the five categories of experiments being conducted in the subject 


study. The first program in each category is the one to be executed, whereas the follow- 


ing programs/functions are being called by it. 


A. DESCRIPTION 


1. 


PCA-LDA Comparison 


PCAvsLDA.m: Creates two 2—dimensional classes, applies PCA and LDA 
projection on them and compares the two projections. 


pca.m: Applies the PCA method on the data, [Lee, 2004]. 
fid.m: Applies the LDA method on the data, [Lee, 2004]. 


sortem.m: Sorts eigenvectors by decreasing eigenvalue magnitude [Lee, 
2004]. 


Distance Selection —3—Dimensional Plots Creation 


DistanceSelection.m: Applies the five distances analyzed in Chapter 
IV.A.2, to the training/testing images and the classes’ centroids to produce 
the respective 3—dimensional and associated contour plots. Additionally, 
creates the confidence score 3—dimensional maps when selecting the Ma- 
halanobis Angular distance. 


dataST.m: Normalizes input data by setting mean to 0 and standard devia- 
tion to 1. (From [Baudat, October 2000].). 


buildGDA_Opt.m: Applies the GDA algorithm to the input data and cre- 
ates the GDA data, which will be used for the input data projection (from 
[Baudat, October 2000]). This function is modified as commented, in or- 
der to vectorize the steps involved in the computations of the GDA projec- 
tion vectors. 


eigensystem.m: Performs eigenvalue—eigenvector decomposition of the in- 
put matrix and sorts the eigenvectors by decreasing eigenvalue magnitude 
(From [Baudat, October 2000].). 
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spreadGDA_Opt.m: Projects the input data onto the GDA eigenvectors 
(from [Baudat, October 2000]). This function is modified to vectorize the 
GDA projection step for higher computational efficiency. 


KernelFunctiont.m: It is called only when the Gaussian kernel function is 
selected by setting the parameter “degree” equal to 0 (From [Baudat, Oc- 
tober 2000].). 


GDA Classification Performance Measurement 


ClassificationPerformance.m: Computes and saves the rank—(1—50) scores 
and the classification performance per class. 


dataST.m: As above. 

buildGDA_Opt.m: As above. 

eigensystem.m: As above. 

spreadGDA_Opt.m: As above. 

KernelFunctiont.m: As above. 

Rank Score 1 Mean Confidence Interval Creation 


Confidencelnterval.m: Derives and plots the 95% confidence intervals ob- 
tained for rank—1 to —5 scores for average classification performances. 


LDA versus Fisherface Comparison, Using GDA with Polynomial 
Kernel Function of Degree 1 as LDA 


PCAGDAI.m: Computes classification performances for the LDA method, 
(implemented using the GDA with polynomial kernel function of degree 
1), and for the Fisherface method (implemented using the PCA and GDA 
with polynomial kernel of degree 1). 


pcea.m: As above. 

sortem.m: As above. 
buildGDA_Opt.m: As above. 
eigensystem.m: As above. 
dataST.m: As above. 
spreadGDA_Opt.m: As above. 


MATLAB CODE LISTING 


PCAvsLDA.m: 


To To 78 FS FH 2 2 He He He He He He He 2 2 2 2 2 2 2 ee fe fe he fe fe he he 26 26 26 26 28 28 28 28 28 28 28 2s 2s 2s 222 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2k ee 2k 2k 


%% Filename: 


PCAvsLDA.m 


%% Thesis Advisor: Prof.M.P.Fargues Naval Postgraduate School 
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%% Author: Captain Dimitrios I. Domboulas 


%o% Hellenic Air Force 

%% Date: May 2004 

%% Description: 1.Creates random two classes of 2—dimensional input vectors 
%% and apply PCA and LDA. 

%% 2.Plots the projections’ 

%o% of the two schemes and compares their separability. 

%% Input: N/A 

%% Outputs: 2—Dimensional projection plot 


%% Functions used: pca.m, fld.m, sortem.m 
%% ok 2k 2k 2k ok of of of of of oe of ois 2k 2k ok ok of of of of oe of 2s 2k 2k ok of of of of of of of ois 2k 2k ok ok of of of of of 2K 2k ok ok ok ok of of oi 2 2 ok 2K ok ok ok ok ok ok ok 


cle; 
clear; 


m1=[2;3]; %% mean for class 1 
sxl=2.8; %% standard deviation class 1 
sy 1=0.4; 

m2=[2;4]; %% mean for class 2 
sx2=2.7; %% standard deviation class 2 
sy2=0.4; 


% create 2—D data matrices 
for k=1:5; 
x1¢,K)=m1+[sx];sy1].*rand(2,1); % create class 1 
x2(:,k)=m2+[sx2;sy2].*rand(2,1); % create class 2 
end 


%% use of PCA 
x12=[x1,x2]; 
[W1,m1,Amean1,EVA1]=pca(x12,k); 


%% center each class==>subtract the mean vector of each class 
{m,n]=size(x1); 

mx 1=x1—mean(x1,2)*ones(1,n); 
mx2=x2—mean(x2,2)*ones(1,n); 


%% | D Projection 
pxl1=W1¢,1)'*x1; 
px12=W1(:,1)'*x2; 


figure 

whitebg('w'); 
plot({[—10,10]*W1(1,1),[-10,10]*W1(2,1));grid on;hold on; 
scatter( x1(1,:) , x1(2,:) ); grid on; 

% axis([—10,10,—10,10]); 
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axis([—2,6,-4,6]); 

hold on; 

scatter( x2(1,:) , x2(2,:), '*' ); 

hold on; 

% plot the 1—D projections 
scatter(px11*W1(1,1),px11*W1(2,1)); 
scatter(px12*W1(1,1),px12*W1(2,1),'*'); 


%% use of LDA 
C=[1,1,1,1,152,2,2,2,2]; 
[W2,D ]=fld(x12,C); 


%% find projected data (LDA eigenspace) 
px21=W2'*x1; 
px22=W2'*x2; 


% plot the LDA projection line 
plot({—10,10]*W2(1),[-10,10]*W2(2),'g—); grid on;hold on; 
hold on; 

% plot the 1—D projections 
scatter(px21*W2(1),px21*W2(2),'g'); 
scatter(px22*W2(1),px22*W2(2),'g*'); 

legend('PCA','LDA' ); 

title’PCA vs LDA Projections');xlabel('x');ylabel(‘y'); 


e DistanceSelection.m: 


To To FS FS FS A A He He He He He 2 He ae 2 2 2 2 a 2 2 ee fe fe fe fe fe he he 26 26 26 28 26 28 28 28 28 28 28 2s 2s 2s 2s 22 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2k 2 2 2k 


%% Filename: 


%% Thesis Advisor: 


%% Author: 
%% 

%% Date: 

%% Description: 
%% 

%% 

%% 

%% Parameters: 
%% 

%% 

%% 

%% 

%% Outputs: 
%% 


%% Functions used: 


%% 


DistanceSelection.m 

Prof.M.P.Fargues Naval Postgraduate School 

Captain Dimitrios I. Domboulas 

Hellenic Air Force 

July 2004 

Produces 3—dimensional distance map plots and associated con— 
tours for various types of distances between 

training/testing images and class centroids. 

Also produces the 3—dimensional confidence score plots/contours. 
1. Kernel function type (Gaussian, Polynomial) 

(degree of polynomial kernel) 

2. Number of K matrix eigenvectors kept 
3. Distance type to be used 
4. Number of iterations 

1. Respective 3—dimensional plots 

2. Respective contours. 

buildGDA_Opt.m, eigensystem.m, dataST.m, 
spreadGDA_Opt.m, KernelFunction.m 
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To To 78 FS A 2 He He He He He He 2 2 2 2 2 2h 2h 2 2 ee fe fe fe fe fe he he 26 26 26 28 26 28 28 28 28 28 2 2s 2 2s 2s 2 2 2 2 2 2 2 2h 2h 2 2 2 2 2 2 2 2 2k 2k ek 


cle; 

clear; 

load A_all; 

clear img A Db Dme x ans img_name N_classj}k mons sl sqrsl tem1 tem2 time; 
clear IndSamples N_objects Section im_num1l im_num?2 Person T v w; 

Atemp = double(A_all); 

clear A_all; 


%% CONSTANTS ' DECLARATION 

Train_Class_Size = 18; %% Number of Training Images per Class 
Test_Class_Size = 12; %% Number of Testing Images per Class 
Num_Train_Imag = 900; %% Total Number of Training Images 
Num_Test_Imag = 600; %% Total Number of Testing Images 

Num_Class = 50; %% Total Number of Classes 

cs = [18,18,18,18,18,18,18,18,18,18]; 

Class_Sizes = [cs, cs, cs, cs, cs]; %%(1 X 50) vector, with class sizes per class 
degree = 2; %%% select Polynomial kernel degree, USE 0 ==> for GAUSSIAN kernel 
ONLY 

Num_ev = 0; %%% input Number of K matrix eigenvectors (0 => Default) 
select_dist = 2; %%% select distance 

Num_Iter= 1; %%% Number of MAIN LOOP ITERATIONS 


for loop = 1: Num_Iter; %%%%% MAIN ITERATION LOOP %%%%%%%%% 
tic %% start timing loops 


Too V0 To Vo No YoU To Yo No Yo Vo Vo Yo Yo Yo Yo Vo Yo To Yo Vo No Yo Yo Vo Yo No Vo Yo Vo Yo Yo Vo Yo Yo Vo Yo Vo Yo Yo 
%%%% Random Permutation Section Borrowed from [Lee, 2004] and modified by 
%%%% [DOMBOULAS, 2004] 


A = Atemp; %% A contains the Training Images (2700x900) 
B = []; %% B contains the testing images (2700x600) 
Delete = []; 


h = waitbar(0, ‘Random Permutation'); %%%% WAITBAR 
for 1=1 : 150 % 1 goes from 1 to the total number of pictures 
waitbar( i/150 ); %%%% WAITBAR 
rp = randperm(10); 
L1 =(d-1) * 10 +1; 
L2 = i*10; 
A(:,L1:L2)=A(:, G1) * 10 + rp ); %% permute images 
IndSamples = [L1 L1+1 L1+2 L1+3]; 
B = [B A(: , IndSamples )]; 
Delete = [Delete IndSamples]; 
end 
close(h); %%%% WAYITBAR 
A(: , Delete ) = []; 
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%%%% Section Borrowed from [Lee, 2004] and modified by [Domboulas, 2004] 
Joo V0 To V0 No YoU Yo Yo No Vo Yo Vo Yo Yo Yo Yo Vo Yo To Yo Vo No Yo Vo Vo Yo No Vo Vo Vo Yo Yo Vo Vo To Yo Yo Vo Yo Vo Yo 


A = DataSt( A ); %% Mean =0, Standard Deviation = 1 

B = DataSt( B ); %% Mean =0, Standard Deviation = | 

Le = A'; %%% Learning Data (900 Images are rows) 

Te = B'; %% 600 Testing images of our Data Base (Images are rows) 
[dataGDA,centeredkM,kM] = buildGDA_Opt( Le , Class_Sizes , degree, Num_ev ); 
%%run GDA 

Pgda_A = spreadGDA_Opt( Le, Le, dataGDA, degree ); %% Projected Training images 
in GDA 

Pgda_B = spreadGDA_Opt( Te, Le, dataGDA, degree ); %% Projected Testing images in 
GDA 


%%% FIND VARIANCE of each column of the 900X49 Projected Train Data Matrix. 
%%% Covariance is used for other Image Distance types 
fork = 1: size( Pgda_A , 2 ); 
Pr_Tr_Dat_Cov_Col(k) = cov( Pgda_A(:,k) ); %% Row Vector 1X49 with variances 
%%% of each of the 49 columns of the Projected Training Data Matrix 900X49 
end; 
Pr_Tr_Dat_Cov_Mat = cov(Pgda_A); %% 49X49 covariance matrix of Proj Train Data 
Matrix 
%%%%%% It is Diagonal! !Invertible & Each Dimension 's features are 
%%%%%% independent from each othert ==> It is the Ideal case 
Inv_Cov = inv( Pr_Tr_Dat_Cov_Mat ); %%% Inverse Covar Matr 
%%% For the Approximated Mahalanobis Distance 
for k = 1 : size( Inv_Cov,2 ); 
s_inv(k) = Inv_Cov(k,k); %% ROW VECTOR 1 x 49 
end; 
sr_s_inv = sqrt(s_inv(k)); %% Row Vector 
sr_inv_diag = diag( sr_s_inv ); %%Diagonal Matrix with diag elem the sq roots of 
%%7% the inverse diag elem of cov matrix 49 x 49 
TT TV FOV VV %%%%_ END VARIANCE SECTION %%%%%%%L0%%%% 











%% Find Centroids per Each Class Projected Training Images 

for k= 1 : Num_Class 

ProjTrainCentr( k , : ) = mean( Pgda_A( ((k—1)*Train_Class_Size + 1) : (k * 
Train_Class_Size ),: ) ); 

end; 


%%7% create cell array 
DistName = {'Norm—2 's"Mahalanobis i 
"Mahalanobis Norm—2 ';'Mahalanobis Norm-—1 ';'Mahalanobis Angular'}; 
%% Find Various Distances of Each Training Image from all the Classes ' Centroids, 
%% (TrainDist 900 X 50) 
for k = 1 : Num_Train_Imag ; 
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for m = 1: Num_Class; 


if select_dist == 1; %% NORM-2 Distance 
TrainDist(k,m) = norm( Pgda_A(k,:) — ProjTrainCentr(m,:),2_); 


elseif select_dist == 2; %% Mahalanobis Distance 
TrainDist(k,m) = sqrt( (Pgda_A(k,:) — ProjTrainCentr(m,:)) * Inv_Cov * 
(Pgda_A(k,:) — ProjTrainCentr(m,:))' ); 


elseif select_dist == 3; %% Mahalanobis NORM-—2 Distance 

temp = ( sr_s_inv .* ( Pgda_A(k,:) — ProjTrainCentr(m,:) ) ) * ( sr_s_inv .* ( 
Pgda_A(k,:) — ProjTrainCentr(m,:) ) )' ; 

TrainDist(k,m) = sqrt( sum( temp ) ); %% Approxim Mahalan Dist 


elseif select_dist == 4; %9% Mahalanobis NORM-—1 Distance 
TrainDist(k,m)=sr_s_inv .* norm( Pgda_A(k,:) — ProjTrainCentr(m,:),1 ); 


else %%%% MAHALANOBIS ANGULAR DISTANCE 

TrainDist(k,m) = acos( ( ( Pgda_A(k,:) * sr_inv_diag ) * ( sr_inv_diag * ProjTrain- 
Centr(m,:)' ) ) / (norm( Pgda_A(k,:).*sr_s_inv ,2 ) * norm( ProjTrain- 
Centr(m,:).*sr_s_inv ,2 ) ) ); 


end; %%% end if 
end; 
end; 


%%% Plot in 3—D the 900 Projected Training Images ' Distances from the 

%%% Classes ' Centroids 

for n=l1:Num_Train_Imag %% Create Training Images ' Indeces for the “mesh” function 
Pgda_AIndex(n)=n; 

end; 

for d=1:Num_Class %% Create Classes ' Indeces for the “mesh” function 
Class_Index(d) = d; 

end; 

figure; 

mesh(Class_Index,Pgda_AIndex,TrainDist); 

xlabel(‘Class Index');ylabel('Training Image Index');zlabel(‘Distance'); 

title( DistName(select_dist) ); %%% Num_Eigv(select_file) 

figure; 

mesh(Class_Index,Pgda_AIndex,TrainDist); 

xlabel(‘Class Index');ylabel('Training Image Index');zlabel(‘Distance'); 

title( DistName(select_dist) ); %%% Num_Eigv(select_file) 

view(0,90); %9% view 3—D plot from top 

axis([{1,50,1,900]); 


%%% Find Various Distances for Testing Images 
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for k = 1 : Num_Test_Imag ; 
for m= 1: Num_Class; 


if select_dist == 1; %% NORM-2 Distance 
TestDist(k,m) = norm( Pgda_B(k,:) — ProjTrainCentr(m,:),2_); 


elseif select_dist == 2; %% MAHALANOBIS DISTANCE 
TestDist(k,m) = sqrt( (Pgda_B(k,:) — ProjTrainCentr(m,:)) * Inv_Cov * (Pgda_B(k,:) — 
ProjTrainCentr(m,:))' ); %% Mahalanobis Distance 


elseif select_dist == 3; %% Mahalanobis NORM-—2 Distance 

temp = ( sr_s_inv .* ( Pgda_B(k,:) — ProjTrainCentr(m,:) ) ) * ( sr_s_inv .* ( 
Pgda_B(k,:) — ProjTrainCentr(m,:) ) )' ; 

TestDist(k,m) = sqrt( sum( temp ) ); %% Approxim Mahalan Dist 


elseif select_dist == 4; %% Mahalanobis NORM-1 Distance 
TestDist(k,m)=sr_s_inv .* norm( Pgda_B(k,:) — ProjTrainCentr(m,:),1 ); 


else %%%% MAHALANOBIS ANGULAR DISTANCE 

TestDist(k,m) = acos( ( ( Pgda_B(k,:) * sr_inv_diag ) * ( sr_inv_diag * ProjTrain- 
Centr(m,:)' ) ) / (norm( Pgda_B(k,:).*sr_s_inv ,2 ) * norm( ProjTrain- 
Centr(m,:).*sr_s_inv ,2 ) ) ); 


end; %%% end if 


end 
end 


%%% Plot in 3—D the 600 Projected Testing Images ' Distances from the 

%%% Classes ' Centroids 

for n=l1:Num_Test_Imag %%Create Testing Images' Indeces for “mesh” function 
Pgda_BIndex(n)=n; 

end; 

for d=1:Num_Class %% Create Classes ' Indeces for the “mesh” function 
Class_Index(d) = d; 

end; 

figure; 

mesh(Class_Index,Pgda_BIndex,TestDist); 

xlabel(‘Class Index');ylabel('Testing Image Index');zlabel('Distance’); 

title( DistName(select_dist) ); %%% Num_Eigv(select_file) 

figure; 

mesh(Class_Index,Pgda_BIndex,TestDist); 

xlabel(‘Class Index');ylabel('Testing Image Index');zlabel('Distance’); 

title( DistName(select_dist) ); %%% Num_Eigv(select_file) 

view(0,90); %% view 3-D plot from top 

axis([{1,50,1,600]); 
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%%% Assign 1-50 Class Indeces for each of the 600 test Images 

T_B = [1,1,1,1,1,1,1,1,1,1,1,1); 

for m= 1 : Num_Class; 

T_B( ( Test_Class_Size * (m—1) + 1 ) : (Test_Class_Size *m) ) =m * T_B( 
1:Test_Class_ Size) ; 

end; %%% T_B contains the class number of each testing image (1 X600) 


%%% Sort Distances per Testing Image -RANK 

for k = 1 : Num_Test_Imag ; 

temp! = sort( TestDist(k,:) ); 

I(k) = find( (temp1 == TestDist(k,T_B(k))*ones(1,Num_Class)) ); 
end; 


%%% Performance Measurement Per Class 
for k= 1: Num_Class; 
temp2 = find( I( ((k—1)*Test_Class_Size + 1) : (k * Test_Class_Size ) ) == ones( 
1,Test_Class_Size ) ); 
Perf_Class(k) = size( temp2 , 2 ) / Test_Class_Size; 
end; 
figure; 
stem(Perf_Class);grid on; 
title(‘Classification Performance per Class’); 
xlabel('Class Index');ylabel(‘Classification Performance’); 


%%% Find the probability of each of the 50 Ranks assigned to all the testing images 
for k =1 : Num_Class ; 

temp3 = find( I == k * ones( 1,Num_Test_Imag ) ); 

Prob(k) = size( temp3 , 2 ) / Num_Test_Imag; 
end; 


%%%% RANK SCORE Evaluation — Cummulative Probability of Ranks — Incresing 
Order 
Rank(1) = Prob(1); 
for k = 2 : Num_Class ; 
Rank( k ) = Rank( k—-1 ) + Prob( k ); %9%Rank is 1X50 with accumulated Probability 
end; 
figure; 
stem(Rank); grid on; axis([0,52,0.8,1.1]); 
title(Cummulative Rank Score for 600 Testing Images'); 
xlabel('Rank Index');ylabel('Cumulative Rank Score’); 


if select_dist == 5; %%% Confidence Score ONLY for Mahal Angular distance 

%%% Find Confidence Score for each Test Image. This will be 1X50 vector per 

%%% image, which will assign the confidence assignment of the test image to each class. 
mm=2; %%% parameter for confidense calculation 
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for j = 1: Num_Test_Imag ; 
fork = 1 : Num_Class ; 
%% d has the distances of proj test imag j from all the proj class centroids 
%% d(k,j) = norm( Pgda_BG,:) — ProjTrainCentr(k,:) , 2 ); %%%% ONLY for 
NORM-2 distance 
%%I%% ONLY for Mahalanobis Angular Distance 
dqj,k) = acos( ( ( Pgda_BQ,:) * sr_inv_diag ) * ( sr_inv_diag * ProjTrainCentr(k,:)' ) ) 
/(norm( Pgda_BG,:).*sr_s_inv ,2 ) * norm( ProjTrainCentr(k,:).*sr_s_inv ,2 ) ) ); 
end 
dd=d'; %%% ONLY For Mahalanobis Angular 
fori = 1: Num_Class ; 
%% w has the confidence of assigning test imag j to class k 
u(i,j) = 1 / sum( ( dd(i,j) * ones(Num_Class,1) ./ dd(:,j) ) .4(2/Qmm-1)) ); %%%% 
NORM-2 CASE 
end 
end 
uu=u'; %%% ONLY For Mahalanobis Angular 


%%% Plot in 3—-D the 600 Projected Testing Images ' Distances from the 

%%% Classes ' Centroids 

forn = 1: Num_Test_Imag; %%Create Testing Images' Indeces for the “mesh” function 
Pgda_BIndex(n) = n; 

end; 

for d= 1 : Num_Class; %% Create Classes ' Indeces for the “mesh” function 
Class_Index(d) = d; 

end; 

figure; 

mesh(Class_Index,Pgda_BIndex,uu); 

xlabel('Class Index');ylabel('‘Testing Image Index’);zlabel(‘Score’); 

title( "Testing Images Confidence Score' ); %%% Num_Eigv(select_file) 

axis([{1,50,1,600,0,1]); 

figure; 

mesh(Class_Index,Pgda_BIndex,uu); 

xlabel(‘Class Index');ylabel('Testing Image Index');zlabel(‘Score’); 

title( "Testing Images Confidence Score' ); %%% Num_Eigv(select_file) 

axis([{1,50,1,600,0,1]); 

view(0,90); %9% view 3-D plot from top 

end; %%% end if 


Rank_718(loop,:) = Rank; 
Perf_Class_718(loop,:) = Perf_Class; 


clear A TA B TB f L1 L2 Le Te dataGDA centeredkM kM; 
clear Pgda_A Pgda_B k j Pr_Tr_Dat_Cov_Col Pr_Tr_Dat_Cov_Mat; 
clear TrainDist TestDist temp Pgda_Aindex Class_Index; 

clear Inv_Cov s_inv sr_inv_diag ProjTrainCentr DistName; 
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clear Pgda_BIndex Class_Index T_B temp1 I temp2 temp3; 
clear Prob Delete d dim1l dim2 h i m nsr_s_inv; 
clear Rank Perf_Class Pgda_AIndex TestDist cs rp; 


time(loop) = toc; %% Stop timing each Main Iteration Loop 
end; %%%%%% END OF MAIN ITERATION LOOP %%%%%%%%%% VV % 


° buildGDA _Opt.m: 


To To FS FR A 2 He He 2 He He He He 2 2 2 a ae 2 2 2 ee fe fe he fe fe he 6 26 26 26 28 28 28 28 28 28 28 28 28 2s 2222 2 2 2 2 2 2 2 2 2 2 2 2 2c 2 2 2c ee 2k 


%% Filename: buildGDA_Opt.m 


%% Thesis Advisor: Prof.M.P.Fargues Naval Postgraduate School 
%% Author: G. Baudat, F. Anouar (“buildGDA.m’—21 October 2000) 

%% Modified by: Captain Dimitrios I. Domboulas 

%o% Hellenic Air Force 


%% Date modified: July 2004 
%% Description: | Creates the GDA data from the input training 
%o% data. 


%% Inputs: 1. Normalized training data matrix (m=0,stdev=1) 
%% 2. Vector with number of element vectors per class 
%o% 3. Kernel Function type and degree 

%o% 4. Number of Kernel matrix K eigenvectors kept 
%% Outputs: 1. GDA data 

%o% 2. Centered K matrix 

%% 3. Uncentered K matrix 


%% Functions used: eigensystem.m, KernelFunction.m 
%% De ee ee ee ee 


%%% Modified 041115 
function [dataGDA,centeredkM,kM]=BuildGDA_Opt(L,S,degree,Num_ev) 


n=length(S); 
m=length(L(:,1)); 
mM=ones(m)/m; 


%%% Create kernel matrix (uncentered) 
if degree == 0; %%% ADDED by D. Domboulas 
%%%% Added waitbars by D. Domboulas for code optimization 
kM=zeros(m); 
h=waitbar(0,'Create Matrix K’'); TT %%%% WAITBAR 
for i=l:m 
waitbar(i/m,h); T%%%%% WAITBAR 
for j=l:m 
kM(,j)=KernelFunction(Ld,:),LG,:)); 
end 
end 
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close(h); %%%%%% WAITBAR 


%%% ADDED by D.DOMBOULAS FOR POLYNOMIAL KERNEL FUNCTIONS 
ONLY 

else 

kM =(L*L' ).Adegree; %%%Matrix Vectorization 

end; %%% end if 

%%% ADDED by D.DOMBOULAS FOR POLYNOMIAL KERNEL FUNCTIONS 
ONLY 


sumK = mM * kM; 


% center the kernel matrix 
centeredkKM=kM-sumK'—sumK+sumK*mM; 
clear mM; 


% decomposition of the centered kernel for beta eigenvectors 
[vecCkM, valCkM]=eigensystem(centeredkM); 


TTT %%%%%%% Plot K matrix Eigenvalues ==> ADDED by D.DOMBOULAS 
for k=1:size(valCkM) 
ValK(k)=valCkM(k,k); 
end 
ValK(1:50); 
figure;plot(ValK);grid on; 
xlabel('Eigenvalue Index’);ylabel(‘Eigenvalue Magnitude’); 
title’GDA K Matrix Eigenvalues’); 
L000 %U%%%% Plot K matrix Eigenvalues ==> ADDED by D.DOMBOULAS 


minVal=valCkM(1,1)/1000; %define the lowest eigen value used 


if Num_ev == 
% % rankValCkM=250; %%% Test Reliability of Results 
rank ValCkM=m; %%% Default numbre of eigvectors kept. 


%%%%%% ADDED by D. Domboulas 

else 

rank ValCkM = Num_ev; %%% keep this number of eigvectors 
end 

%T%%%%%% ADDED by D. Domboulas 


h=waitbar(0,'Center K Matrix with Highest Eigvalues'); %%%% WAITBAR 
for 1=1:m 
waitbar(i/m,h); %%%% WAITBAR 
if valCkM(i,i)<minVal; 
valCkM(i,i)=0; 
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rank ValCkM=rank ValCkM-1; 
end 
end 
close(h); %%%% WAITBAR 


valCkM=valCkM(1:rank ValCkM, | :rank ValCkM); 
vecCkM=vecCkM(.:,1:rank ValCkM); 

%% centered K matrix with only the highest eigen values/vectors 
centeredkKM=vecCkM*valCkM*vecCkM'; 


Yow matrix of the weights, bloc diagonal 
wMz=zeros(m); 


stBloc=1; 
endBloc=0; 
h=waitbar(0,'Create Matrix W'); %%%% WAITBAR 
for i=I:n 
waitbar(i/n,h); %%%% WAITBAR 


endBloc=endBloc+S(i); 
for j=stBloc:endBloc 
for k=stBloc:endBloc 
wMg,k)=1/S(); 
end 
end 
stBloc=stBloc+S(i); 
end 
close(h); %%%% WAITBAR 


%compute alpha normalized vectors 

nbAxes=min([{rank ValCkM],[n—1]); 
[vecBeta,valBeta]=eigensystem(vecCkM'*wM*vecCkM); 
tmp=zeros(1,nbAxes); 


h=waitbar(0,'Create TEMP Matrix’); %%%% WAITBAR 
for i=1:nbAxes ; 
waitbar(i/nbAxes,h); %%%% WAITBAR 
tmp(i)=valBeta(i,i); 
end 
close(h); %%%% WAITBAR 


% fprintf(‘Inertia: %f\n',tmp); 
tmp=vecCkM*inv(valCkM)*vecBeta; 
norAlpha=zeros(m,nbA xes); 


h=waitbar(0,'Create norAlpha matrix’); %%%% WAITBAR 
for i=1:nbAxes ; 
waitbar(i/nbAxes,h); %%%% WAITBAR 
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norAlpha(:,i)=tmp(:,i)/sqrt(tmp(:,1)'*centeredkM*tmp(:,i)); 
end 
close(h); %%%% WAITBAR 


sumK=sumK(1,:); 
dataGDAsstruct(‘norAlpha' norAlpha,'sumK’',sumK); 


e spreadGDA_Opt.m: 


To To FF FR HH A 2 He He He He He He 2 2 2 2 2 2 ee fe he fe fe he fe 6 6 26 26 26 26 28 28 28 28 28 28 2 2 2s 2s 222 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2k 2 kk 


%% Filename: spreadGDA_Opt.m 

%% Thesis Advisor: Prof.M.P.Fargues Naval Postgraduate School 
%% Author: G. Baudat, F. Anouar (“buildGDA.m’”—21 October 2000) 
%% Modified by: Captain Dimitrios I. Domboulas 

%% Hellenic Air Force 

%% Date modified: July 2004 

%% Description: Projects the input GDA data on to Kernel matrix 
%% K eigenvectors. 

%% Inputs: 1. Normalized testing data (m=0, stdev=1) 

%% to be projected on the GDA data eigvectors 
%% 2. Normalized training data (m=0, stdev=1), 
%% which is used to build the GDA data 

%% 3. GDA data 

%% 4. Kernel Function type and degree 

%% Outputs: Projected data on the Kernel matr K eigvectors 


%% Functions used: KernelFunction.m 
%% ok 2k 2k 2k ok ok of of of of of 2 ois 2k 2k ok of ok of of of 2 of 2s 2 2k ok of of of of of of of 2K ok 2k ok of of of of of of of 2s ok ok ok ok of of of 2 of 2K 2K ok ok ok ok ok ok ok 


function projected Vectors=SpreadGDA_Opt(T,L,dataGDA, degree); 
n=length(T(:, 1)); Yosize of the test set 

m=length(L(:,1)); Yosize of the learning set 
KernelEva=zeros(n,m); 


%%% Waitbars added by D. Domboulas for code optimization 

if degree == 0; %%% ADDED by D. Domboulas 

h=waitbar(0,'Center K Matrix for SPREADGDA'); 2%%% WAITBAR 
for i=I:n 


waitbar(i/n,h); %%%% WAITBAR 
tmp=zeros(1,m); 
for j=l:m 
tmp(j)=KernelFunction(T(,:),LG,:)); 
end 
KernelEva(i,:)=tmp—sum(tmp)/m; 
end 
close(h); %%%% WAITBAR 
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TTT %%%%% ADDED by D. DOMBOULAS for Matrix Vectorization 
TUT %%%% ONLY for POLYNOMIAL KERNELS 

else 

temp! = (T*L').“degree; %%% Matrix Vectorization 

KernelEva = temp1 — mean(temp1,2) * ones(1,m); %% Remove Mean 

end; %%% end if 

TTT F%%%%% ADDED by D. DOMBOULAS for Matrix Vectorization 
TV %%% ONLY for POLYNOMIAL KERNELS 


inter=dataGDA.sumK-sum(dataGDA.sumK)/m; 

h=waitbar(0,! PROJECTION SPREADGDA'); %%%% WAITBAR 

fori=1:n; 

waitbar(i/n,h); %%%% WAITBAR 
KernelEva(i,:)=KernelEvad,:)—inter; 

end 

projected Vectors=KernelEva*dataGDA.norAlpha; 

close(h); %%%% WAITBAR 


e ClassificationPerformance.m: 


To To 8 FS FS HH 2 He 2 He He He 2 2 2 2 2 2 2h 2 2 ee he fe fe fe fe he 6 26 26 26 28 28 28 28 28 28 28 28 28 2s 2 2s 2 2 2 2 2 2 2 2 2 2 2 2 2 2c 2k ee 2k 


%% Filename: ClassificationPerformance.m 

%% Thesis Advisor: Prof.M.P.Fargues Naval Postgraduate School 
%% Author: Captain Dimitrios I. Domboulas 

%% Hellenic Air Force 

%% Date: July 2004 

%% Description: Computes classification performance of the GDA 
%% method, for various kernel functions 

%% Parameters: 1. Kernel function type (Polyn degree/Gaussian) 
%% 2. Number of K matrix eigenvectors kept 

%% 3. Distance type to be used 

%% 4. Number of iterations 

%% Outputs: 1. Rank—(1—50) Scores (in “.mat” file) 

%% 2. Classification Performance per Class 

%% (in “.mat” file) 

%% Functions used: buildGDA_Opt.m, eigensystem.m, dataST.m, 
%o% spreadGDA_Opt.m, KernelFunction. 

%% FAS fe is FAS 2g 2s FAS 2fg 2s FAS 2fg 2s FAS 2fe 2s 2S 2fg 2s 2S 2g is 2S oie 2s 2S fe 2s FIs 2fe 2s 2S fg 2s 2S fg 2s 2S oie 2s 2S 2s 2s 2S 2fe 2s 2s oie 2s 2s oie 2s 2s ois 2s IS oie 2s 2s oie 2s 2s 2k 2 2 ok 
cle; 

clear; 


tic %% start timing 1000 iterations 


load A_all; 


clear img A Db Dme x ans img_name N_classj}k mn s sl sqrsl tem1l tem2 time; 
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clear IndSamples N_objects Section im_num1l im_num?2 Person T v w; 
Atemp = double(A_all); 
clear A_all; 


%% CONSTANTS ' DECLARATION 

Train_Class_Size = 18; %% Number of Training Images per Class 
Test_Class_Size = 12; %% Number of Testing Images per Class 
Num_Train_Imag = 900; %% Total Number of Training Images 
Num_Test_Imag = 600; %% Total Number of Testing Images 

Num_Class = 50; %% Total Number of Classes 

cs = [18,18,18,18,18,18,18,18,18,18]; 

Class_Sizes = [cs, cs, cs, cs, cs]; %%(1 X 50) vector, with class sizes per class 
degree = 2; %% select Polynom kernel degree, USE 0 ==> for GAUSSIAN ker ONLY 
Num_ev = 0; %%% input Number of K matrix eigenvectors (0 => Default) 
select_dist = 2; %%% select distance 

Num_Iter= 1; %%% Number of MAIN LOOP ITERATIONS 


h=waitbar(0,, MAIN ITERATION LOOP’); %%%% WAITBAR 


for loop = 1: Num_Iter; %%%%% MAIN ITERATION LOOP 
ToT To To Mo To Yo Mo Mo Yo Mo Yo Yo Yo To Yo Mo To Yo Mo To Yo Mo Vo Yo Yo Vo Yo Yo 


waitbar(loop/Num_Iter , h); %%%% WAITBAR 
JoTo To Vo Vo To Vo Vo Mo Vo To Vo Yo Vo Vo Vo Vo Vo Yo Vo Vo Vo Vo Vo Vo Vo No Vo Vo Vo Vo No Vo Vo Vo Vo Yo Vo Vo Yo Vo Vo 


%%%% Random Permutation Section Borrowed from [Lee, 2004] and modified by 
%%%% [DOMBOULAS, 2004] 


A = Atemp; %% A contains the Training Images (2700x900) 
B = []; %% B contains the testing images (2700x600) 
Delete = []; 


% %  h=waitbar(0, ‘Random Permutation'); %%%% WAITBAR 
for 1=1 : 150 % 1 goes from | to the total number of pictures 
% % ~~ waitbar( i/150 ); %%%% WAITBAR 
rp = randperm(10); 
L1 =(-1) * 1041; 
L2 = 1*10; 
A(:,L1:L2)=A(:, @-1) * 10 +rp ); %% permute images 
Index = [L1 L1+1 L142 L143]; 
B =[B AC: , Index )]; 
Delete = [Delete Index]; 
end 
% % ~~ close(h); %%%% WAITBAR 
AC: , Delete ) = []; 
%%%% Section Borrowed from [Lee, 2004] and modified by [Domboulas, 2004] 
JT % To No Lo % Vo Vo No Vo Vo Vo To Vo Vo Vo Vo Vo Lo Yo Vo No Yo Vo Vo Yo U0 Vo To Vo Vo Vo Vo To Yo Vo Vo Vo Yo Vo Yo 
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A = DataSt( A ); %% Mean =0, Standard Deviation = 1 

B = DataSt( B ); %% Mean =0, Standard Deviation = | 

Le = A'; %%% Learning Data (900 Images are rows) 

Te = B'; %% 600 Testing images of our Data Base (Images are rows) 
[dataGDA,centeredkM,kM] = buildGDA_Opt( Le , Class_Sizes , degree, Num_ev ); 
%%run GDA 

Pgda_A = spreadGDA_Opt( Le, Le, dataGDA, degree ); %% Projected Training images 
in GDA 

Pgda_B = spreadGDA_Opt( Te, Le, dataGDA, degree ); %% Projected Testing images in 
GDA 


%%% FIND VARIANCE of each column of the 900X49 Projected Train Data Matrix. 
%%% Covariance is used for other Image Distance types 
fork = 1: size( Pgda_A , 2 ); 
Pr_Tr_Dat_Cov_Col(k) = cov( Pgda_A(:,k) ); %% Row Vector 1X49 with variances 
%%% of each of the 49 columns of the Projected Training Data Matrix 900X49 
end; 
Pr_Tr_Dat_Cov_Mat = cov(Pgda_A); %% 49X49 covariance matrix of Proj Train Data 
Matrix 
%%%%%% It is Diagonal! !Invertible & Each Dimension 's features are 
%%%%%% independent from each othert ==> It is the Ideal case 
Inv_Cov = inv( Pr_Tr_Dat_Cov_Mat ); %%% Inverse Covar Matr 
%%% For the Approximated Mahalanobis Distance 
for k = 1 : size( Inv_Cov,2 ); 
s_inv(k) = Inv_Cov(k,k); %% ROW VECTOR 1 x 49 
end; 
sr_s_inv = sqrt(s_inv(k)); %% Row Vector 
sr_inv_diag = diag( sr_s_inv ); %%Diagonal Matrix with diag elem the sq roots of 
%%7% the inverse diag elem of cov matrix 49 x 49 
ToT To To Yo Vo To Vo Yo To No Yo Yo V0 Vo Yo Uo Yo Vo Yo Vo V0 Vo Yo Uo Vo Vo YO VLEND: VARIANCE SEC- 
TION 











%% Find Centroids per Each Class Projected Training Images 

for k= 1 : Num_Class 

ProjTrainCentr( k , : ) = mean( Pgda_A( ((k—1)*Train_Class_Size + 1) : (k * 
Train_Class_Size ),: ) ); 

end; 


%%% create cell array 
DistName = {'Norm—2 's"Mahalanobis ts 
"Mahalanobis Norm—2 ';'Mahalanobis Norm-1 ';'Mahalanobis Angular'}; 


%%% Find Various Distances for Testing Images 


for k = 1 : Num_Test_Imag ; 
for m= 1: Num_Class; 
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if select_dist == 1; %% NORM-2 Distance 
TestDist(k,m) = norm( Pgda_B(k,:) — ProjTrainCentr(m,:),2 ); 


elseif select_dist == 2; %% MAHALANOBIS DISTANCE 
TestDist(k,m) = sqrt( (Pgda_B(k,:) — ProjTrainCentr(m,:)) * Inv_Cov * (Pgda_B(k,:) — 
ProjTrainCentr(m,:))' ); %% Mahalanobis Distance 


elseif select_dist == 3; %% Mahalanobis NORM-2 Distance 

temp = ( sr_s_inv .* ( Pgda_B(k,:) — ProjTrainCentr(m,:) ) ) * ( sr_s_inv .* ( 
Pgda_B(k,:) — ProjTrainCentr(m,:) ) )' ; 

TestDist(k,m) = sqrt( sum( temp ) ); %% Approxim Mahalan Dist 


elseif select_dist == 4; %% Mahalanobis NORM-1 Distance 
TestDist(k,m)=sr_s_inv .* norm( Pgda_B(k,:) — ProjTrainCentr(m,:),1 ); 


else %%%% MAHALANOBIS ANGULAR DISTANCE 

TestDist(k,m) = acos( ( ( Pgda_B(k,:) * sr_inv_diag ) * ( sr_inv_diag * ProjTrain- 
Centr(m,:)' ) ) / (norm( Pgda_B(k,:).*sr_s_inv ,2 ) * norm( ProjTrain- 
Centr(m,:).*sr_s_inv ,2 ) ) ); 


end; %%% end if 


end 
end 


%%% Assign 1-50 Class Indeces for each of the 600 test Images 

T_B = [1,1,1,1,1,1,1,1,1,1,1,1); 

for m= 1 : Num_Class; 

T_B( ( Test_Class_Size * (m—1) + 1): (Test_Class_Size *m) ) =m * T_B( 
1:Test_Class_Size) ; 

end; %%% T_B contains the class number of each testing image (1 X600) 


%%% Sort Distances per Testing Image -RANK 

for k = 1 : Num_Test_Imag ; 

temp! = sort( TestDist(k,:) ); 

I(k) = find( (temp1 == TestDist(k,T_B(k))*ones(1,Num_Class)) ); 
end; 


%%% Performance Measurement Per Class 
for k= 1: Num_Class; 
temp2 = find( I( ((k—1)*Test_Class_Size + 1) : (k * Test_Class_Size ) ) == ones( 
1,Test_Class_Size ) ); 
Perf_Class(k) = size( temp2 , 2 ) / Test_Class_Size; 
end; 


%%% Find the probability of each of the 50 Ranks assigned to all the testing images 
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for k =1 : Num_Class ; 
temp3 = find( I == k * ones( 1,Num_Test_Imag ) ); 
Prob(k) = size( temp3 , 2 ) / Num_Test_Imag; 

end; 


%%%% RANK SCORE Evaluation — Cummulative Probability of Ranks — Incresing 
Order 
Rank(1) = Prob(1); 
for k = 2 : Num_Class ; 

Rank( k ) = Rank( k—-1 ) + Prob( k ); %% Rank is 1X50 with accumulated Probability 
end; 


Rank_718(oop,:) = Rank; 
Perf_Class_718(loop,:) = Perf_Class; 


clear A TA B TB f L1 L2 Le Te dataGDA centeredkM kM; 
clear Pgda_A Pgda_B k j Pr_Tr_Dat_Cov_Col Pr_Tr_Dat_Cov_Mat; 
clear TrainDist TestDist temp Pgda_Aindex Class_Index; 

clear Inv_Cov s_inv sr_inv_diag ProjTrainCentr DistName; 

clear Pgda_BIndex Class_Index T_B temp1 I temp2 temp3; 

clear Prob Delete d diml dim2 i m nsr_s_inv Index; 

clear Rank Perf_Class Pgda_AIndex TestDist cs rp; 











% % save Mah_Ang_Rank_PerfClas_GDA_0_7_040926 Rank_718 Perf_Class_718 ; 
save Mah_Ang_ Rank_PerfClas_GDA_2_041116 Rank_718 Perf_Class_718 ; 


end; %%%%%% END OF MAIN ITERATION LOOP %%%%%%%% %o%% VG Vo 
close(h); ToT To VV V0 V0 V0 % %o% W AITBAR 


time(loop) = toc; %% Stop timing each Main Iteration Loop 


e ConfidenceInterval.m: 


To To 8 FS FS A 2 He He He He He He He 2 2 2 2 2 2 2 2 ee fe fe fe fe fe he he 26 26 26 28 28 28 28 28 28 28 2s 28 2s 2s 22 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2k 2 2 2k 2k 


%% Filename: ConfidenceInterval.m 

%% Thesis Advisor: Prof.M.P.Fargues Naval Postgraduate School 

%% Author: Captain Dimitrios I. Domboulas 

%% Hellenic Air Force 

%% Date: August 2004 

%% Description: Plots the Confidence Interval of the 

%% rank—1 to rank—5 score means of 5 different 

%% distances used for classification. 

%% Also, computes the mean rank—1 score for all 

%o% “mat” files created by “ClassificationPerformance.m”’ 
%% Input: “mat” files created by “ClassificationPerformance.m” 
%% Outputs: 1. Confidence Interval plot 

%o% 2. Rank—1 score mean for each file. 
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%9% Functions used: N/A 
%% ok oi 2k 2k ok ok of of of of of ois ois ok 2k ok of of of of of of of 2s 2 2k ok of of of of of of 2k 2 2k 2k ok ok of of of of of 2k 2k ok 2k ok ok of of of 2 of 2k 2K ok ok ok ok ok ok ok ok 


cle; 
clear; 


%%%%% Execute GDA Polynomial Deg 2 %%%%DDOLD%%%%% 
load Mahalanobis_Rank_040818; 

Mah = Rank_718; 

av_Mah = mean( Mah(:,1) ) 


load Norm2_Rank_040817; 
N2 = Rank _ 718; 
av_N2 = mean( N2(:,1) ) 


load Mahalanobis _N2_ Rank 040820; 
Mah_N2 = Rank_718; 
av_Mah_N2 = mean( Mah_N2(:,1) ) 


load Mahalanobis_N1_Rank_040821; 
Mah_N1 = Rank_718; 
av_Mah_N1 = mean( Mah_N1(:,1) ) 


load Mahalanobis_Ang_ Rank_040822; 
Mah_Ang_1 = Rank_718; 
av_Mah_Ang_1 = mean( Mah_Ang_1(:,1) ) 


%%% It has both Rank & Perf per Class 

load Mahalanobis_Ang_Rank_040826; 

Mah_Ang_2 = Rank_718; 

av_Mah_Ang_ 2 = mean( Mah_Ang_ 2(:,1) ) 

T00%%%% GDA Polynom Deg 2 

ToT To Vo Yo Vo To Vo Yo Vo No Yo Vo No Vo Yo Vo No Yo Yo Vo To Vo Yo Yo No Yo Yo Vo Vo 


TTT VU%%%% + GDA Polynom Deg 1 
Joo To oT Vo Vo Vo Vo Vo Vo Vo Vo Mo Vo Yo Vo Vo Vo Yo Yo Yo Vo 
%%% Only GDA Pol Deg 1==> LDA 

%%% It has both Rank & Perf per Class 

load Norm2_Rank_ PerfClas_ LDA_040827; 
N2_GDAI1 = Rank _ 718; 

av_N2_GDAI1 = mean( N2_GDA\1(:,1) ) 

%%% Only GDA Pol Deg 1==> LDA 

%% It has only Rank 

load Mahalanobis_Ang Rank__LDA_040824; 
Mah_Ang GDAI = Rank_718; 

av_Mah_Ang GDA1 = mean( Mah_Ang GDA1(:,1) ) 
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%%%% First PCA then GDA Pol deg 1 <==> PCA + LDA 
%%% It has both Rank & Perf per Class 

load N2_Rank_PerfClas_PCAGDA1_040829; 
N2_PCAGDAI1 = Rank_718; 

av_N2_PCAGDAI = mean( N2_PCAGDA1(:,1) ) 


%%%% First PCA then GDA Pol deg 1 <==> PCA + LDA 
%%% It has both Rank & Perf per Class 

load Mah_Ang_Rank_PerfClas_PCAGDA1_040901; 

Mah_Ang PCAGDAI = Rank_718; 

av_Mah_Ang PCAGDA1 = mean( Mah_Ang PCAGDAI1(:,1) ) 
TTT V%%%%%% + GDA Polynom Deg 1 

ToT To To Yo Vo To Vo Yo To No Yo Yo Vo Vo Yo Uo Yo Vo Yo Vo Vo Vo 


TV %%0%%%% GDA Polynom Deg 2 VARIOUS NUMBER EIGENVEC- 
TORS 

%%%% GDA Pol deg 2 ==> 100 EigVec 

%%% It has both Rank & Perf per Class 

load Mah_Ang_Rank_PerfClas_GDA2_100EigVec_040911; 

Mah_Ang_ GDA2_100EV = Rank_718; 

av_Mah_Ang GDA2_100EV = mean( Mah_Ang_ GDA2_100EV(:,1) ) 





%%%% GDA Pol deg 2 ==> 150 EigVec 

%%% It has both Rank & Perf per Class 

load Mah_Ang_Rank_PerfClas_GDA_2_150EigVec_041027; 
Mah_Ang GDA2_150EV = Rank_718; 

av_Mah_Ang GDA2_150EV = mean( Mah_Ang_ GDA2_150EV(:,1) ) 





%%%% GDA Pol deg 2 ==> 200 EigVec 

%%% It has both Rank & Perf per Class 

load Mah_Ang_Rank_PerfClas_GDA_2_200EigVec_041026; 
Mah_Ang GDA2_200EV = Rank_718; 

av_Mah_Ang GDA2_200EV = mean( Mah_Ang_ GDA2_200EV(:,1) ) 





%%%% GDA Pol deg 2 ==> 250 EigVec 

%%% It has both Rank & Perf per Class 

load Mah_Ang_Rank_PerfClas_GDA_2_250EigVec_041025; 
Mah_Ang GDA2_250EV = Rank_718; 

av_Mah_Ang GDA2_250EV = mean( Mah_Ang GDA2_250EV(:,1) ) 





%%%% GDA Pol deg 2 ==> 500 EigVec 

%%% It has both Rank & Perf per Class 

load Mah_Ang_Rank_PerfClas_GDA2_500EigVec_040908; 
Mah_Ang GDA2_SO0EV = Rank_718; 

av_Mah_Ang GDA2_500EV = mean( Mah_Ang GDA2_500EV(:,1) ) 
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TTT VIF %%0%%GDA Polynom Deg 2 VARIOUS NUMBER EIGENVEC- 
TORS 


%%%% GDA Pol deg 3 

%%% It has both Rank & Perf per Class 

load Mah_Ang_Rank_PerfClas_GDA_3_041003; 
Mah_Ang GDA3 = Rank_718; 

av_Mah_Ang GDA3 = mean( Mah_Ang_ GDA3(:,1) ) 
%%%% GDA Pol deg 3 


%%% Find Confidence Intervals of the Rank Means— 

%%% Assume GAUSSIAN distr, default CI=95% 

%%% of the values of the Ranks 
[muhat_M,sigmahat,muci_M,sigmaci] = normfit( Mah ); 
[muhat_N2,sigmahat,muci_N2,sigmaci] = normfit( N2 ); 
[muhat_MN2,sigmahat,muci_MN2,sigmaci] = normfit( Mah_N2 ); 
[muhat_MN1,sigmahat,muci_MN1,sigmaci] = normfit( Mah_N1 ); 
[muhat_MAn,sigmahat,muci_MAn,sigmaci] = normfit( Mah_Ang_1 ); 


resol = 10; 

rank = 10; 

for k = 1 : rank; 
CI_M(:,k) = ( linspace( muci_M(1,k) , muci_M(2,k) , resol ) )'; 
CI_N2(:,k) = ( linspace( muci_N2(1,k) , muci_N2(2,k) , resol ) )’; 
CI_MN2(:,k) = ( linspace( muci_MN2(1,k) , muci_MN2(2,k) , resol ) )’; 
CI_MN1(:,k) = ( linspace( muci_MN1(1,k) , muci_MN1(2,k) , resol ) )’; 
CI_MAn(:,k) = ( linspace( muci_MAn(1,k) , muci_MAn(2,k) , resol ) )’; 

end 


figure; 

% % plot( muhat_MAn , 'r' ); 

axis((0,25,0.975,1]); grid on; hold on; 

title(‘Confidence Intervals of Means of first 5 Rank Scores’); 
xlabel(‘Rank Score Index');ylabel("Rank Score Mean’); 


%%%%% Create resolution for 95% Confidence Intervals 
for k = 1: resol; 

x1(k)=1; x2(k)=2; x3(k)=3; x4(k)=4; x5(k)=5; x6(k)=6; x7(k)=7; x8(k)=8; x9(k)=9; 
x10(k)=10; x11(k)=11; 

x12(k)=12; x13(k)=13; x14(k)=14; x15(k)=15; x16(k)=16; x17(k)=17; x18(k)=18; 
x19(k)=19; 

x20(k)=20; x21(k)=21; x22(k)=22; x23(k)=23; x24(k)=24; x25(k)=25; 
end; 


%%%% Plot 95% Confidence Intervals of first 5 Rank Scores 
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plot( 0 , CILM(,1), 'b', 0, CI_N2(1,1), 'g', 0, CILMN2(1,1), 'r', 0, CILMN1(1,1), 'm', 
0, CI_MAn(1,1), 'c' ); 

plot( x1 , CI_LM(,1), 'b*' , x6 , CI_M(:,2), 'b*', x11, CI_LM(:,3), 'b*', x16, CI_M(:,4), 
'b*' , x21 , CI_M(.,5), 'b*' ); 

plot( x2 , CI_N2(:,1), 'g*', x7 , CI_LN2(¢:,2), 'g*' , x12 , CI_N2(;,3), 'g*' , x17 , CI_N2(:,4), 
'g® | x22 , CI_N2(:,5), 'g*' ); 

plot( x3 , CILMN2(;,1), 'r*', x8 , CILMN2(:,2), 'r*' , x13 , CILMN2(:,3), 'r*', x18, 
CI_MN2(.,4), 'r*', x23 , CILMN2(:,5), 'r*' ); 

plot( x4 , CILMNI1(,1), 'm*', x9 , CILMN1(:,2), 'm*' , x14, CILMN1(:,3), 'm*', x19, 
CI_MN1(:,4), 'm*’ , x24 , CILMNI1(:,5), 'm*' ); 

plot( x5 , CILMAn(;,1), 'c*', x10 , CILMAn(:,2), 'c*', x15 , CILMAn(.,3), 'c*' , x20, 
CI_MAn(.,4), 'c*', x25 , CI_LMAn(:,5), 'c*' ); 

legend(‘CI-M’, 'CI-N2', 'CI-MN2', 'CI-MN1', 'CI-MAn' ) 


e PCAGDAI.m: 


To To 8 FS FH 2 He He He He He He He 2 2 2 2 a 2h 2 2 ae fe fe fe he fe fe he 6 26 26 26 28 26 28 28 28 28 28 2s 28 2s 2s 282 22 2 2 2 2 2 2 2 2 2 2 2 2 2 2k 2 2 2k ak 


%% Filename: PCAGDAI.m 
%% Thesis Advisor: Prof.M.P.Fargues Naval Postgraduate School 


%% Author: Captain Dimitrios I. Domboulas 

%% Hellenic Air Force 

%% Date: August 2004 

%% Description: Computes classification performance for the LDA 
%o% method (i.e. GDA with polynomial kernel deg—1) 
%% or of the Fisherface method (PCA + GDA-1) 

%% Parameters: 1. Method to be used (GDA-1 or Fisherface) 

%% 2. Number of K matrix eigenvectors kept 

%o% 3. Distance type to be used 

%o% 4. Number of iterations 

%% Outputs: 1. Rank—(1—50) Scores 

%o% 2. Classification Performance per Class 

%% Functions used: buildGDA_Opt.m, eigensystem.m, dataST.m, 
%o% spreadGDA_Opt.m, pca.m, sortem.m 

%% Fig 2g 2s FAS 2g 24s FAS fg 2s FAS 2g 2s 2S 2g 2s 2S fg 2s 2S 2fe 2s FAS fg 2s 2S 2g 2s 2S fg 24s 2S 2g 2s 2S 2g 2s 2S 2fe 2s 2S ois 2s 2S ois 2s FIs oie 2s 2S 2fs 2s 2s 2s is 2s 2k 2s 2s ois 2s 2s 2k 2 2K ok 
cle; 

clear; 


tic %% start timing 1000 iterations 


load A_all; 

clear img A Db Dme x ans img_name N_classjk mn s sl sqrsl teml tem2; 
clear IndSamples N_objects Section im_num1 im_num? Person T time v w; 
Atemp = double(A_all); 

clear A_all; 
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%% CONSTANTS ' DECLARATION 

Train_Class_Size = 18; %% Number of Training Images per Class 
Test_Class_Size = 12; %% Number of Testing Images per Class 
Num_Train_Imag = 900; %% Total Number of Training Images 
Num_Test_Imag = 600; %% Total Number of Testing Images 

Num_Class = 50; %% Total Number of Classes 

cs = [18,18,18,18,18,18,18,18,18,18]; 

Class_Sizes = [cs, cs, cs, cs, cs]; %%(1X50) vector, with class sizes per class 
sel_method = 1; %%%Use 0 for GDA-—1 and anything else for Fisherface 
degree = 1; %%% ALWAYS Polynomial kernel deg 1 FOR GDA-1==>LDA 
Num_ev = 0; %%% input Number of K matrix eigenvectors (0 => Default) 
select_dist = 1; %%% select distance 

Num_Iter= 1; %%% Number of MAIN LOOP ITERATIONS 


h=waitbar(0," MAIN ITERATION LOOP’); %%%% WAITBAR 
for loop = 1: Num_Iter; %%%%% MAIN ITERATION LOOP %%%%%%%%%% 
waitbar(loop/Num_Iter , h); %%%% WAYITBAR 


JoTo To ToT No Vo Vo Vo To Vo Vo Vo Vo Yo Lo Vo Vo Yo To No Yo Vo Yo To Vo Vo Vo Vo Yo Yo Vo Vo Yo To No Yo Vo Vo Yo Vo Vo Yo 
%%%% Random Permutation Section Borrowed from [Lee, 2004] and modified by 
%%%% [DOMBOULAS, 2004]. 


A = Atemp; %% A contains the Training Images (2700x900) 
B = []; %% B contains the testing images (2700x600) 
Delete = []; 


for 1=1 : 150 % 1 goes from 1 to the total number of pictures 
rp = randperm(10); 
Li = (-1) * 10+ 1; 
L2 = i*10; 
A(:,L1:L2)=A(:, G1) * 10 +1rp ); %% permute images 
Index = [L1] L1+1 L142 L143]; 
B =[B AC: Index )]; 
Delete = [Delete Index]; 
end 
AC: , Delete ) = []; 
%%%% Section Borrowed from [Lee, 2004] and modified by [DOMBOULAS, 2004]. 
ToT To To Vo Vo Vo Vo Yo To Vo Yo Yo Vo Vo Yo Uo To Vo Yo Yo To Vo Yo Vo Vo Vo Yo Vo To Yo Yo Vo Vo Vo Yo Yo Vo Yo Vo V0 Vo Yo 


%%%%% Run GDA-1 OR PCA first, then GDA-—1 (Fisherface) %%%%% % % % Go % Vo 
%%%%% Run GDA-1 %%%%%%% 

if sel_method == 0; 

A = DataSt( A ); %% Normalize Data—> Mean =0, Standard Deviation = 1 

B = DataSt( B ); %% Normalize Data—> Mean =0, Standard Deviation = 1 

Le = A'; %%% Learning Data (900 Images are rows) 

Te = B'; %% 600 Testing images of our Data Base (Images are rows) 
[dataGDA,centeredkM,kM] = buildGDA_Opt( Le , Class_Sizes , degree, Num_ev ); 
%%run GDA 
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Pgda_A = spreadGDA_Opt( Le, Le, dataGDA, degree ); %% Projected Training images 
in GDA 

Pgda_B = spreadGDA_Opt( Te, Le, dataGDA, degree ); %% Projected Testing images in 
GDA 


else %%%% Run Fisherface 

ToT VV V%%% PCA 

[Wpca,m,Amean,Ad] = pca(A); %%%Apply PCA to IrisData 

PApca = Wpca' * (Ad); %% Compute the PCA projection matrix of IrisData 
PBpca = Wpca' * (B — Amean*ones(1,Num_Test_Imag)); 


%%%%% GDA Pol Deg 1 ==> LDA 

Le = PApca'; %%% Learning Data (900 Images are rows) (DO NOT NORMALIZE 
DATA) 

Te = PBpca'; %%% 600 Testing images of our Data Base (Images are rows) (DO NOT 
NORMALIZE DATA) 

[dataGDA,centeredkM,kM] = buildGDA_Opt( Le , Class_Sizes , degree, Num_ev ); 
%%run GDA, Polyn Deg 1 

Pgda_A = spreadGDA_Opt( Le, Le, dataGDA , degree ); %% Projected Train images in 
GDA, Polyn Deg 1 

Pgda_B = spreadGDA_Opt( Te, Le, dataGDA , degree ); %% Projected Test images in 
GDA, Polyn Deg 1 

end; %%% end if 

Jo%o To V0 % Vo V0 Vo Vo Vo Vo Yo Lo VO VO NOVO %%% PCA LDA %%%%GUL6OLLG%%% 


%%% FIND VARIANCE of each column of the 900X49 Projected Train Data Matrix. 
%%% Covariance is used for other Image Distance types 
fork = 1: size( Pgda_A , 2 ); 
Pr_Tr_Dat_Cov_Col(k) = cov( Pgda_A(:,k) ); %% Row Vector 1X49 with variances 
%%% of each of the 49 columns of the Projected Training Data Matrix 900X49 
end; 
%%%% 49X49 covariance matrix of Proj Train Data Matrix 
Pr_Tr_Dat_Cov_Mat = cov(Pgda_A); 
%%%%%% It is Diagonal! !Invertible & Each Dimension 's features are 
%%%%%% independent from each othert ==> It is the Ideal case 
Inv_Cov = inv( Pr_Tr_Dat_Cov_Mat ); %%% Inverse Covar Matr 
%%% For the Approximated Mahalanobis Distance 
for k = 1 : size( Inv_Cov,2 ); 
s_inv(k) = Inv_Cov(k,k); %% ROW VECTOR 1 x 49 
end; 
sr_s_inv = sqrt(s_inv(k)); %% Row Vector 
sr_inv_diag = diag( sr_s_inv ); %%Diagonal Matrix with diag elem the 
%%% square roots of the inverse diag elem of cov matrix 49 x 49 
TT VV FOV VV %%%. END VARIANCE SECTION %%%%%%%%%%% 











%% Find Centroids per Each Class Projected Training Images 
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for k= 1 : Num_Class 

ProjTrainCentr( k , : ) = mean( Pgda_A( ((k—1)*Train_Class_Size + 1) : (k * 
Train_Class_Size ),: ) ); 

end; 


%%% create cell array 
DistName = {'Norm—2 's"Mahalanobis ee 
"Mahalanobis Norm—2 ';'Mahalanobis Norm-1 ';'Mahalanobis Angular'}; 


%%% Find Various Distances for Testing Images 
fork = 1 : Num_Test_Imag ; 
for m= 1: Num_Class; 


if select_dist == 1; %% NORM-2 Distance 
TestDist(k,m) = norm( Pgda_B(k,:) — ProjTrainCentr(m,:),2_); 


elseif select_dist == 2; %% MAHALANOBIS DISTANCE 
TestDist(k,m) = sqrt( (Pgda_B(k,:) — ProjTrainCentr(m,:)) * Inv_Cov * (Pgda_B(k,:) — 
ProjTrainCentr(m,:))' ); %% Mahalanobis Distance 


elseif select_dist == 3; %% Mahalanobis NORM-2 Distance 

temp = ( sr_s_inv .* ( Pgda_B(k,:) — ProjTrainCentr(m,:) ) ) * ( sr_s_inv .* ( 
Pgda_B(k,:) — ProjTrainCentr(m,:) ) )' ; 

TestDist(k,m) = sqrt( sum( temp ) ); %% Approxim Mahalan Dist 


elseif select_dist == 4; %% Mahalanobis NORM-—1 Distance 
TestDist(k,m)=sr_s_inv .* norm( Pgda_B(k,:) — ProjTrainCentr(m,:),1 ); 


else %%%% MAHALANOBIS ANGULAR DISTANCE 

TestDist(k,m) = acos( ( ( Pgda_B(k,:) * sr_inv_diag ) * ( sr_inv_diag * ProjTrain- 
Centr(m,:)' ) ) / (norm( Pgda_B(k,:).*sr_s_inv ,2 ) * norm( ProjTrain- 
Centr(m,:).*sr_s_inv ,2 ) ) ); 


end; %%% end if 


end 
end 


%%% Assign 1-50 Class Indeces for each of the 600 test Images 

T_B = [1,1,1,1,1,1,1,1,1,1,1,1]; 

form = 1: Num_Class; 

T_B( ( Test_Class_Size * (m—1) + 1 ) : ( Test_Class_Size *m) )=m * T_B( 
1:Test_Class_Size) ; 

end; %%% T_B contains the class number of each testing image (1X600) 


%%% Sort Distances per Testing Image - RANK 
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for k = 1 : Num_Test_Imag ; 

temp! = sort( TestDist(k,:) ); 

I(k) = find( (temp1 == TestDist(k,T_B(k))*ones(1,Num_Class)) ); 
end; 


%%% Performance Measurement Per Class 
for k= 1: Num_Class; 
temp2 = find( I( ((k—1)*Test_Class_Size + 1) : (k * Test_Class_Size ) ) == ones( 
1,Test_Class_Size ) ); 
Perf_Class(k) = size( temp2 , 2 ) / Test_Class_Size; 
end; 


%%% Find the probability of each of the 50 Ranks assigned to all the testing images 
for k =1 : Num_Class ; 

temp3 = find( I == k * ones( 1,Num_Test_Imag ) ); 

Prob(k) = size( temp3 , 2 ) / Num_Test_Imag; 
end; 


%%%% RANK SCORE Evaluation — Cummulative Probability of Ranks — Increasing 
Order 
Rank(1) = Prob(1); 
for k = 2 : Num_Class ; 

Rank( k ) = Rank( k—1 ) + Prob( k ); %9% Rank is 1X50 with accumulated Probability 
end; 


Rank_718(loop,:) = Rank; 
Perf_Class_718(loop,:) = Perf_Class; 


clear A TA B TB f L1 L2 Le Te dataGDA centeredkM kM; 
clear Pgda_A Pgda_B k j Pr_Tr_Dat_Cov_Col Pr_Tr_Dat_Cov_Mat; 
clear TrainDist TestDist temp Pgda_Aindex Class_Index; 

clear Inv_Cov s_inv sr_inv_diag ProjTrainCentr DistName; 

clear Pgda_BIndex Class_Index T_B temp1 I temp2 temp3; 

clear Prob Delete d diml dim2 i m nsr_s_inv; 

clear Rank Perf_Class Pgda_AIndex TestDist cs rp; 











% % save Mah_Ang Rank _PerfClas_GDA_0_7_040926 Rank_718 Perf_Class_718 ; 
save Mah_Ang_ Rank_PerfClas_GDA_1_041116 Rank_718 Perf_Class_718 ; 


end; %%%%%% END OF MAIN ITERATION LOOP %%%%%%%%%%%% 


close(h); TTT VV % V0 FoF FO%O%o ~=W AITBAR 
time(loop) = toc; %% Stop timing each Main Iteration Loop 
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APPENDIX C. RANK SCORE PLOTS 


In this appendix, the following typical random rank score plots are provided. Each 


was derived from one random iteration of the program “DistanceSelection.m’, listed in 


Appendix B. 


Curmmulative Rank Score for 600 Testing Images 
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Rank score plot example 1. 


Figure C.1. 
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Cummulative Rank Score for 600 Testing Images 
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Rank score plot example 2. 


Figure C.2. 
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Cummulative Rank Score for 600 Testing Images 
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Rank score plot example 3. 


Figure C.3. 
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